f 


Workshop  Notes 


10th  Annual  Workshop  on 
Interconnections  Within  High 
Speed  Digital  Systems 

9-12  May  1999 


HHfon  of  Santa  Fe 
Santa  Fe,  New  Mexico 


Sponsored  by  the  IEEE  Lasers  A  Electro-Optics  Society  and 
in  cooperation  with  the  IEEE  Computer  Society  and  the 
IEEE  Communications  Society 


DISTRIBUTION  STATEMENT  A 

Approved  for  Public  Release 
Distribution  Unlimited 

DTIC  quality  nWPBGTED  4 


19991220  009 


_ REPORT  DOCUMENTATIOlf^^AG,^ _ 

Pubie  rafxxting  bunian  lor  thi*  colectlon  of  Irtomatlon  i>  etUnalad  to  awtags  1  hour  per  raeponae,  tochKing  the  t^pRL"SR"BL"TR  99 
gathering  «id  imirtaining  the  date  needed,  and  oompMing  and  m^enang  the  cdedion  of  Mormiiioa  Send  comme  ^  ^ 

ooiectwn  of  Wbnnalion,  including  s^E?Qifcor)r|br  reducing  tWefcwden^^  A 

Dawe  l^ghwwy■$4Me  1204,  Arlington.  VA  22202^^302,  and  to  the  Office  (rfManagernent  and  Budget,  PaperworfcReductio  U  C7  ^  ( 


1.  AGENCY  USE  ONLY  (Leave  2.  REPORT  DATE  3.  REPOl 

Biank)  12  May  1999 _ Tecli 


4.  TTOE  AND  SUBTITLE 

10th  Annual  Workshop  on  Interconnection  within  High  Speed  Digital  Systems 


3.  REPORT  TYPE  AND  w,, a  wwwwvcru=U 
Technical 


7.  PERFORMING  ORGANIZATION  NAR«(S)  AND  ADDRESS(ES) 
lEEE-LEOS 
445  Hoes  Lane 
Ptscataway.  NJ  08855-1331 


5.  FUNDING  NUMBERS 


8.  PERFORMING  ORGANIZATION  REPORT 
NUMBER 


9.  SPONSORING  /  MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 
AFOSR 

801  North  Randolph  Street,  Room  732 
Arlington,  VA  22203-1977 


10.  SPONSORING /MONITORING  AGENCY 
REPORT  NUMBER 
F49620-99-1-0300 


12a  DISTRIBUTION  /  AVAILABILITY  STATE^^NT  ^ 


12b.  DISTRIBUTION  CODE 


13.  ABSTRACT  (Maximum  200  wcais) 


loth  Annual  Workshop  on  Interconnections  within  High  Speed  Digital  Systems, 


14.  SUBJECT  TERMS 

Interconnections,  systems  architectures,  electronic, 
optelectronic,  and  optical  interconnections 


15.  NUMBER  OF  PAGES 

116 


16.  PRICE  CODE 


17.  SECURITY  CLASSIFICATION 
OF  REPORT 
unclassified 


18.  SECURITY  CLASSIRCATION  19.  SECURITY  CLASSIFICATION  20.  UMITATION  OF  ABSTRACT 

OF  THIS  PAGE  OF  ABSTRACT  UL 

unclassified  unclassified 


Stwxiard  Form  298  (Rev.  2-69) 
Prescribed  by  ANSI  Std.  Z39-1 
298-102 


Program  Committee 


Workshop  Chair 

Howard  Davidson 
Sun  ^fidrosystems 
Sunnyvale,  CA 

Pro^m  Chair 

Philippe  Marchand 

UCSD 

La  Jolla,  CA 

Tutorials  Chair 

George  P^en 
Univraity  of  Illinois 
Urbana,  IL 

Workhy  Croup  Chair 

AsWeySauIsbuiy, 

Sun  hficrosystems 
Sunnyvale,  CA 


Technical  Program  Committee 


Marc  Christensen 

George  Mason  University,  Fairfax,  VA 
Kirk  Giboney 

Hewlett  Packard  Laboratories,  Palo  Alto,  CA 
Anthony  Lentine 

Lucent  Technologies  Bell  Laboratories,  Holmdel,  NJ 

John  Levy 

Cisco,  San  Jose,  CA 
Tulin  Mangir 

TM  Associates,  Santa  Monica,  CA 
John  Poulton 

University  of  North  Carolina,  Ch^pell  Hill,  NC 
Harold  Stone 

NEC  Research  Center,  Princeton,  IL 


Working  &roup  Committee 


International  Liasons 

Peter  DeDobbelaere 
Akzo  Nobel 
Sunnyvale,  CA 

HenkNeefs 
University  of  Gent 
Gent^i  BELGIUM 

OsmuWada 
Fujitsu  Laboratories 
Atsugi,  JAPAN 


Lew  Aronson 

Hewlett  Packard  Laboratories,  Palo  Alto,  CA 
Giorgio  Giaretta 

Lucent  Technologies  Bell  Laboratories,  Holmdel,  NJ 
Charles  Kuznia 

University  of  Southern  California,  Los  Angeles,  CA 

Rick  Lytel 

Sun  Microsystems,  Suimyvale,  CA 

Henk  Neefs 

University  of  Gent,  Gent,  BELGIUM 

Steve  Tam 

Cisco,  San  Jose,  CA 


Workshop  Scope 


The  continuing  rapid  increase  in  the  performance  of  high  speed  electronics  and  communications 
technologies  has  led  to  dramatic  improvements  in  advanced  computing  and  communications 
systems.  The  rapid  growth  of  computer  internetworking  and  the  rise  of  new  applications  such  as 
multimedia  and  virtual  reality  are  driving  the  requirements  for  still  higher  levels  of  computing  and 
communications.  Interconnections  within  digital  computing  and  switching  systems  today  are  often 
perceived  as  a  performance  bottleneck.  The  purpose  of  this  Workshop  is  to  determine  the 
interconnection  requirements  of  emerging  and  future  computer  and  communications  systems  and 
to  disseminate  information  about  state-of-the-art  optical  and  electrical  interconnection 
technologies  at  the  component,  packaging,  and  systems  level. 

Because  of  the  multi-disciplinary  nature  of  these  problems,  this  Workshop  brings  together 
researchers  and  engineers  with  expertise  in  a  variety  of  fields  including  electronic,  optoelectronic, 
and  optical  interconnection  technologies,  advanced  systems  architectures  as  well  as  the  systems 
level  perspective  of  algorithms  and  applications.  The  Workshop  is  comprised  of  tutorials  and 
invited  talks  of  the  highest  caliber  as  well  as  a  few  contributed  papers.  la  addition,  all  attendees 
participate  in  smaller  working  groups  to  discuss  and  address  a  central  focus  design  problem. 
Working  groups  are  diverse  and  multi-disciplinary.  In  the  past,  problems  ranging  from  high- 
performance  workstation  design  to  tele-medicine  applications  have  been  considered.  Historically, 
this  workshop  has  provided  a  stimulating,  highly  interactive  environment  conducive  to  thought- 
provoking  discussions.  Take  advantage  of  this  opportunity  to  contribute  to  a  great  Santa  Fe 
experience!  More  information  can  be  foimd  on  the  web  at  http://soliton.ucsd.edii/ihsds/santafe99 


SUNDAY,  9  MAY  1999 


3:00pin  -  6:15pm  Tutorial  Session 

Session  Chair:  George  Papen,  University  of  Dlinois,  Urbana,  IL 
3:00pm  3:45pm  Tutorial  -  1 

Device  and  Interconnect  Technologies  for  -100  GELe  Mixed>Signal  ICs,  Mark  Rodwell,  UC  Santa  Barbara^  Santa  Barbara,  CA 
160  Gb/s  TDM  optical  links  will  require  ICs  with  >  150  GHz  analog  bandwidth  and  a  80  or  160  GHz  clock.  Mixed-signal  ICs  (DACs/DDSs/ADCs) 
for  digital  processing  of  2-20  GHz  radar  signals  will  have  2000-transistor  complexity  and  -100  GHz  clock  rates.  To  permit  clock  rates  exceeding 
100  GHz,  transistor  current-gain  (ft)  and  power-gain  (finax)  cutoff  frequencies  must  be  several  hundred  GHz.  The  interconnects  must  have  small 
capacitance  per  unit  length,  and  wire  lengths,  hence  transistor  spacings,  must  be  small.  Given  that  fast  transistors  operate  at  high  current  densities, 
effective  heatsinking  is  essential.  To  prevent  circuit-circuit  interaction  through  common-lead  inductance  ("ground  bounce"),  low  wiring  ground- 
return  inductance  is  required  within  the  IC  and  between  IC  and  package.  We  report  a  transferred  substrate  heterojunction  bipolar  transistor  (HBT)  IC 
technology  providing  scalable  submicron  HBTs  with  record  250  GHz  ft  and  820  GHz  finax.  The  interconnects,  microstrip  on  a  low-epsilon 
dielectric,  have  low  capacitance  and  high  velocity  and  a  ground  plane  for  low  ground-return  inductance.  An  electroplated  Au/Ni/Cu  metal  substrate 
with  Au  thermal  vias  provides  effective  heatsinking.  Demonstrated  ICs  include  85  GHz  amplifiers  and  60  GHz  M/S  latches.  To  manage  power-delay 
products  in  larger  circuits,  low-voltage-swing  (nkT/q)  circuits  are  being  investigated. 

3:45pm  -  4:30pm  Tutorial  -  2 

Overview  of  Nonlinear  Optics  for  High  Speed  Communication,  Bahaa  Saleh,  Boston  University,  Boston,  MA 
4:30pm  -  4:45pm  Break 
4:45pm  -  5:30pm  Tutorial  -  3 

Advances  in  Chip  Level  Packaging,  John  Carson,  Irvine  Sensors  Corporation,  Costa  Mesa,  CA 

Two  major  directions  in  chip  level  packaging  will  be  observed  during  the  next  decade:  thinner  packages  (and  therefore  thinner  chips)  and  more  direct 
chip  attach  techniques.  Package  thickness  will  be  pushed  to  as  low  as  0.5  mm  for  various  applications  enabled  by  agressive  chip  thinning  techniques. 
In  direct  chip  attach,  peripheral  leads  in  a  fooqjrint  smaller  than  the  IC  carrier  will  appear  in  mainstream  applications  limited  only  by  printed  circuit 
board  constraints.  Combined,  these  two  trends  will  drive  toward  increased  use  of  three  dimensional  stackhig  techniques.  Examples  of  thinned  chips 
on  flexible  substrates  and  three  dimensional  assemblies  of  multi-chip  packages  are  shown  to  portend  these  coming  events, 

5:30pm  -  6:15pm  Tutorial  -  4 

Modeling,  Analysis  and  Simulation  of  Data  Networks,  Yusuf  Ozturk,  San  Diego  State  University,  San  Diego,  CA 

The  first  topic  will  be  more  network  modeling  and  analysis  oriented.  I  can  demonstrate  some  traffic  collection  tools  and  later  incorporating  the  data 
collected  into  commercial  simulation  tools.  We  can  work  around  practical  problems  for  edacity  planning  and  projections  to  the  feature.  This  fits 
very  good  into  a  workshop  program.  This  talk  will  reflect  my  experiences  and  common  mistakes  network  managers  and  analysis  specialists  are  doing 
during  the  data  collection  process ,  analysis  and  simulation  of  their  network.  This  tutorial  will  be  mostly  a  demonstration  of  network  design  process 
starting  fix)m  data  source  characteristics,  network  topology  selection,  modeling  and  analysis. 

6:30pm  -  7:30pm  Welcome  Reception 

8:00pm  -  8:30pm 

Kickoff  Meeting  for  Working  Group  Leaders 

Session  Cfaair:  Ashley  Saulsbury,  SUN  Microsystems,  Mountain  View,  CA 


MONDAY,  10  MAY  1998 


8:00am  -  8:15am  Workshop  Welcome 

Workshop  Chair:  Howard  Davidson,  Sun  Microsystems,  Mountain  View,  CA 

Session:  Short  Haul  Interconnects 

Session  Chair:  Kirk  Giboney,  Hewlett  Packard  Laboratories 

8:15am  -8:45am 

1.1  Overview  of  10Gbit  Ethernet,  Peter  Wang,  3COM  Technology  Development  Ctr.  Santa  Clara,  CA 

Internet  traffic  is  exploding.  Intranet,  extranet.  E-commerce  and  Voice-over-IP  are  all  contributing  to  the  growth  of  data  networks.  Gigabit  Ethernet 
deployment  is  ramping  up,  as  are  broadband  access  networks.  Carriers  are  planning  for  multi-gigabit  backbone  deployment.  Dense  Wavelength 
Division  Multiplexing  is  the  talk  of  the  town.  Are  there  alternatives?  Is  the  world  ready  for  10  Gb/s  networking? 

This  talk  will  explore  the  key  enabling  technologies  and  the  various  interconnect  options  for  constructing  10  Gb/s  links  for  the  next  generation 
backbone.  We  will  also  touch  on  the  challenges  of  building  switching  infiastructure  for  the  10  Gb/s  data  networks. 


8:45am  •  9:15am 

1.2  PAROLl  a  Synchronous  Optical  Interconnection  Link  with  a  Through  Put  of  13  Gbit/s,  Karsten  Droegemueller, 

Siemens  AG,  GERMANY 

Data  communication  and  telecom  switching  systems  require  interconnections  with  high  density,  high  data  throughput  and  low  power  consumption. 
The  design,  realization,  and  characterization  of  a  multichannel  parallel  optical  interconnection  with  a  12  fiber  ribbon  and  with  an  optical  data  rate  of 
1,25  Gbit/s  per  channel  is  reported.  Two  versions  will  be  presented.  First,  a  bit  synchronous  link  with  an  electrical  interface  consisting  of  22 
differential  data  channels  operating  at  500  Mbit/s  each  plus  one  clock  channel.  Second,  an  asynchronous  link  with  12  electrical  differential  data 
channels  at  1,25  Gbit/s  each.  On  the  transmitter  side  a  vertical-cavity  surface-emitting  laser  (VCSEL)  array  is  employed  as  light  source.  Results  of 
reliability  test  of  the  VCSEL's  are  given  in  the  presentation 

9:15am-  11:30am 

Session:  Intra-System  Interconnects 

Session  Chair:  RickLytel,  Sun  Microsystems,  Mountain  View,  CA 

9:15am  -9:45am 

13  Tb/s  Chip  I/O  -  How  Close  are  we  to  Practical  Reality?,  Rick  Walker,  Hewlett  Packard  Laboratories,  Palo  Alto,  CA 

Computer  and  Router  designers  are  counting  on  Tb/s  chip-to-chip  data  transmission  capability  to  continue  expanding  their  system  performances  to 

meet  the  global  demand. 

Several  prototype  serial  links,  with  clock  and  data  recovery,  have  been  published  at  2-10  Gb/s  data  rates  per  pin.  Much  work  is  focussed  on  lowering 
the  power  and  size  of  these  links  to  allow  hundreds  of  links  to  be  integrated  onto  a  single  chip. 

Even  with  these  advances,  some  scary  system  issues  still  remain.  Power  supply  noise  can  have  disastrous  effects  on  PLL  and  DLL  performance. 
Signal  crosstalk  can  close  up  an  otherwise  open  eye.  Each  advance  in  CMOS  scaling  reduces  the  analog  circuit  options  available  to  the  link  designer. 

The  copper  signal-transmission  infrastructure  is  not  improving  at  anywhere  near  a  "Moore's  law"  rate.  FR4  has  been  the  standard  dielectric  for  high- 
density  PCBs  for  over  20  years,  and  coax  cables  are  an  extremely  mature  art  Dielectric  and  skin  loss  limit  data  rates  to  approximately  lOGb/s,  and 
further  advances  may  be  slow  coming. 

This  talk  will  explore  these  trends  and  attempt  to  forecast  the  future  of  high-speed  serial  interconnects, 

9:45am  -  10:00am  Coffee  Break 
10:00am  -  10:30am 

1.4  Interconnect  Requirements  for  Digital  Cross-connect  Systems,  Roger  Holmstrom  and  Robert  Ward,  Tellabs  Operations  Inc.,  Lisle,  IL 
The  requirements  for  high-speed  and  high-density  board-to-board  interconnects  in  digital  cross-connect  systems  are  such  that  new  and  emerging 
technologies  are  sought  These  requirements  are  discussed  in  terms  of  physical,  perfonnance,  reliability,  and  cost  metrics.  Some  alternatives  are 
evaluated.  For  long  reach  interconnects,  parallel  optics  are  favored.  For  short  reach  interconnects,  electrical  interconnects  are  chosen. 

10:30am  -  11:00am 

1.5  Moore's  Law:  The  Intra-system  I/O  Challenge,  Craig  Theorin,  W.  L.  Gore  &  Associates,  Lompoc,  CA 

The  modularity  of  recent  high  speed  digital  system  designs  have  created  die  need  for  intra-system  I/O  bandwidth  in  excess  of  10  Gbit/sec.  Most 
system  architects  anticipate  this  bandwidth  requirement  to  scale  with  Mooreis  law  for  the  foreseeable  future,  creating  a  substantial  signal  integrity 
challenge  for  future  data  links.  We  will  describe  the  chip-to-chip  signal  integrity  concerns  and  likely  solutions  for  intra-system  I/O  in  the  early  part 
of  the  next  millennium  as  aggregate  bandwidths  scale  beyond  100  Gbit/sec. 

11:00am  -  11:30am 

1.6  DDR  and  RAMBUS  (High  Speed  Bus)  DRAMD,  Mian  Quddus,  Samsung,  KOREA 
11:00am  -  11:30am 

Workshop  Problem  Statement  for  the  1998  Workshop: 

Ashley  Sauls  bury,  Sun  Microsystems,  Mountain  View,  CA 

12:00pm  -  1:30pm  Luncheon  &  Working  Group  Session  1 
1:30pm  -  3:30pm  Working  Group  Session  H 
3:30pm  -  6:30pm  Free  Afternoon 
6:30pm  -  7:30pm  Reception 

8:30pm  -  9:30pm  Special  Event  10  Years  of  Santa  Fe  Experience 
Speaker:  Harold  Stone,  NEC,  Princeton,  NJ 


TUESDAY,  11  MAY  1998 


8:15am  -  9:45am 

Session:  Optical  Interconnects  for  High-performance  Computing  Systems 

Session  Chair:  Harold  StonCy  NEC,  Princeton,  NJ 

8:15am  -8:45am 

2.1  Interconnects  in  Scalable,  Distributed  Mulitprocessor  Systems,  Jeffiey  Kuskin,  Silicon  Graphics,  Inc,  Mountain  View,  CA 
Communication  among  processing  nodes  (that  is,  CPUs,  memories,  and  I/O  devices)  is  perhaps  the  key  component  in  the  design  of  a  multiprocessor 
system.  Traditionally,  multiprocessors  have  been  constructed  by  connecting  a  small  number  of  processing  nodes  to  a  common,  shared  bus.  The 
shared  bus  provides  not  only  a  mechanism  for  the  processing  nodes  to  communicate,  but  also  allows  all  communication  to  be  broadcast  to  all  nodes 
on  the  bus. 

The  broadcast  capability  of  a  shared  bus  greatly  simplifies  the  overall  ^stem  design.  Unfortunately,  electrical  and  mechanical  constraints  severely 
limit  the  number  of  nodes  that  a  single  shared  bus  can  support.  For  this  reason,  multiprocessors  that  scale  to  large  numbers  of  processing  nodes  do 
away  with  the  single  shared  bus  and  instead  employ  a  distributed  system  design  in  which  processing  nodes  are  intercormected  via  a  high-bandwidth, 
low-latency,  switched  routing  fabric. 

The  use  of  a  routing  fabric  overcomes  the  scalability  limitations  of  a  shared  bus,  but  introduces  a  number  of  complications  of  its  own.  This  talk  will 
explore  the  use  of  high-performance  interconnects  in  a  distributed  multiprocessor  system.  We  begin  with  a  short  discussion  of  the  basic  distributed 
multiprocessor  node  architecture  and  interconnection  fabric,  and  the  difficulties  that  such  an  architecture  creates.  We  then  describe  how  these 
problems  are  solved  in  practice,  with  an  emphasis  on  the  role  of  the  interconnection  network.  We  conclude  vrith  some  thoughts  on  the  increasing 
importance  of  communication  in  multiprocessor  system  designs  and  the  demands  that  will  be  placed  on  future  multiprocessor  interconnection 
networks. 

8:45am  -9:15am 

2.2  The  Role  of  Optics  in  Balanced  Computer  System  Design,  Mike  Chastain,  Hewlett-Packard  Company,  Richardson,  TX 

Computer  system  architects  have  been  waiting  and  watching  the  development  of  parallel  optics  for  five  years  or  more,  hoping  that  breakthroughs  in 
the  producibility,  pack-aging,  and  resultant  costs  would  finally  make  optical  links  cost  competitive  with  their  cop-per  counterpoints.  The  inherent 
advantages  of  optical  interconnects  are  well  known.  The  inherent  reduction  in  physical  size  of  both  connectors  and  cables,  increased  usable  com¬ 
munication  distance,  and  reduced  susceptibility  to  EMI  and  EMC  have  always  been  appealing;  but  the  costs  have  always  forced  designers  to  do  it  in 
copper  just  one  more  time. 

In  the  last  few  years  we  have  all  witnessed  the  almost  exponential  climb  in  CPU  clock  rates,  soon  to  break  the  gigahertz  barrier,  and  we  have  seen 
virtually  every  performance  feature  ever  implemented  in  the  fastest  supercomputers  migrate  to  single  chip  CPUs.  Along  with  these  improvements  has 
come  an  equgJIy  impressive  increase  in  the  data  band-widths  required  to  keep  these  CPUs  in  execution.  Soon  we  may  see  single  chip  CPUs  capable 
of  consuming  10  Giga-bytes  per  second  or  more. 

These  enormous  bandwidths  are  forcing  CPU  and  ASIC  designers  alike  to  push  every  integrated  circuit  coimection  to  the  maximum  j^equency  in 
order  to  maintain  pin  counts  at  manufecturable  levels.  Intelis  recent  switch  to  Rambus  DRAM  is  a  clear  indication  that  all  vendors,  even  PC  vendors, 
are  faced  with  this  problem. 

As  we  increase  the  frequency  of  these  products  we  will  test  the  limits  of  printed  circuit  technology.  Skin  effect  losses  are  already  a  problem,  and 
within  a  few  years  these  loses  will  be  replaced  in  priority  by  dielectric  losses,  perhaps  limiting  the  usable  connection  dis-tance  to  a  single  backplane 
or  PC  planar.  Cables  with  very  good  dielectrics  will,  of  course,  allow  longer  distance;  but  there  will  always  be  copper  trace  in  the  path.  As 
frequencies  increase,  every  4  to  5  inches  of  copper  trace  will  reduce  the  usable  cable  length  by  about  three  feet  in  addition  to  decreases  due  to  the 
cable  dielectric  losses.  We  may  do  well  to  connect  adjacent  racks  with  copper  cables,  let  alone  cross  machine  rooms. 

So  it  appears,  within  a  few  years,  that  copper  interconnects  of  more  than  a  few  meters  could  easily  become  very  expensive,  while  VCSEL  technology 
and  creative  packaging  may  finally  yield  cost  effective  parallel  optic  interfaces. 

The  inevitable  shift  to  parallel  optical  technology  may  occur  because  of  this  juncture  of  over  stressed  copper  and  mass  produced  optics,  but  there  is 
still  a  major  disconnect  between  the  future  needs  of  computer  systems  and  the  roadmap  of  optical  components.  Today  the  optical  roadmap  is  driven 
by  the  teleconmnmications  industry,  which  seems  to  be  increasing  communication  frequencies  in  a  fixed  i4xi  pattern  from  622Mhz,  to  2.5Ghz,  to 
lO.OGhz  while  the  computer  industry  tends  to  take  smaller  i2xi  increments.  The  pre-ferred  frequency  pattern  for  the  computer  industry  must  include 
2.5Ghz  and  5Ghz.  These  frequencies  are  especially  important  for  computer  /  optic  integration  because  many  technologist  now  believe  the  useful  limit 
of  printed  circuit  technology  is  around  5Ghz.  Beyond  this  frequency,  i.e.  at  lOGhz,  it  may  be  impossible  to  build  a  reasonable  size  backplane. 

9:15am  -  9;45am 

23  In  Pursuit  of  a  Petaflop:  Overcoming  the  Latency/Bandwidth  Walt,  Peter  Kogge,  Notre  Dame  University 

The  fastest  machines  on  the  planet  today  peak  at  around  a  teraflop  (10^12,  floating  point  operations  per  second),  with  plans  over  the  next  few  years  to 
approach  10-30  TF.  This  performance,  however,  is  still  insufficient  for  several  important  classes  of  applications.  Performance  levels  of  a  Petaflop 
(10^15  flops)  thus  become  a  valuable  target  to  aim  for.  Unfortunately,  achieving  this  with  current  conventional  technology  and  architecture  seems  to 
be  difficult,  and  destined  to  wait  for  the  2010-2015  timeframe. 

The  twin  demons  in  this  wall  appear  to  be  latency  and  bandwidth:  getting  enough  data  to  the  right  processing  logic  in  a  timely  enough  fashion  that 
the  logic  can  be  kept  profitably  busy,  and  doing  so  in  a  fashion  that  the  the  amount  of  parallelism  Aat  must  be  found  in  an  application  is  acceptable. 


This  talk  will  address  one  approach  to  breaking  this  wall:  the  HTMT  project  (Hybrid  Technology  MultiThreaded  acrhitecture).  This  multi-institution 
collaboration  is  in  the  middle  design  phase  of  a  long-term  effort  started  in  1994  to  find  alternatives  to  conventional  architectures  and  relevant 
technologies,  and  if  successful,  will  result  in  a  petaflops  level  machine  by  around  2006. 

The  solutions  used  by  HTMT  encompass  both  technology  and  architecture.  In  technology,  superconducting  logic,  very  fast  WDM  all  optical 
networks,  Processing-In-Memoiy  (PIM),  and  3D  hologra^)hic  storage  form  the  basic  underpinnings  for  a  radically  different  machine.  In  architecture, 
multithreading,  active  memoiy,  and  automatic  percolation  of  data  throughout  a  very  deep  memory  hierarchy  all  are  central  players. 

This  talk  will  overview  the  inherent  problems  associated  with  achieving  a  petaflops,  and  discuss  the  architecture  of  the  current  HTMT  design. 
Although  all  aspects  of  the  machine  will  be  discussed,  emphasis  will  be  placed  on  the  active  memories,  where  PIM  technology  coupled  with  the 
concepts  of  percolation,  allow  massive  parallelism  in  the  memory  system  to  execute  large  portions  of  an  application  in  ways  that  defeat  the 
bandwidth/latency  barriers  formed  by  conventional  ^proaches. 

9:45am  -  10:15am 

2.4  Ultra-High  Speed  Optical  Interconnection  Network  for  Supercomputing,  Keren  Bergman,  Princeton  University,  Princeton,  NJ 
In  an  attempt  to  effectively  utilize  die  immense  bandwidth  of  optical  fiber  interconnects,  we  designed  a  completely  novel  network  architecture 
specifically  for  optical  implementation.  This  work  is  part  of  an  aggressive  multidisciplinary  architecture  study  of  the  next  generation  high 
perfoimance  computing  based  on  hybrid  technologies  and  multi-threaded  (HTMT)  latency  management  The  optical  network  employs  multiple  node 
levels  with  a  routing  topology  that  is  based  on  a  minimum  logic  at  the  node  scheme.  Our  architecture  features  radically  new  traffic  control  logic, 
having  the  property  that  all  routing  decisions  for  the  self-routing  data  packets,  are  based  on  a  single  logic  operation  at  each  node.  The  optical 
network,  named  the  Data  Vortex,  can  scale  to  interconnect  an  ultra-high  performance  computing  system  in  a  massively  parallel  form.  Within  the 
framework  of  the  Data  Vortex  network  we  are  investigating  enabling  fiber  optic  technologies  and  an  implementation  tiiat  consists  of  fiber 
interconnects  with  wavelength  division  multiplexed  and  time  division  multiplexed  (WDM/TDM)  payload  and  header.  The  development  and 
incorporation  into  the  network  of  fiber  optic  modules  including  high  speed  fiber  lasers,  amplifiers,  and  switching  nodes  will  be  discussed  in  this  talk. 

10:15am  -  10:30am  Coffee  Break 

10:30am  -  12:00noon 

Session:  Optical  Networks 

Session  Chair:  Tulin  Mangir,  TM  Associates,  Santa  Monica,  CA 


10:30am  -  11:00am 

2.5  Uitrafast  Optical  Interconnect  Based  on  Routing  by  “Clockwork”  in  Regular  Mesh  Networks,  David  Cotter,  British  Telecom,  UK, 

F.  Chevalier,  ondD.  Harle  University  of  Strathclyde,  UK 

The  effectiveness  of  multi-processor  systems  (such  as  future  massive-c^acity  routers  and  servers)  is  critically  dependent  on  the  speed  and  efficiency 
of  interconnection.  Full  connectivity  is  required  with  large  message  throughput  and  minimal  delay.  An  option  under  consideration  is  an  ultra-high 
speed  multi-stage  packet-switched  network,  using  fixed-length  packets  at  serial  bit  rates  of  0. 1-1  Tbit/s.  The  packets  are  routed  through  the  network 
on  optical  pipes,  with  »routing  and  digital  header  processing  (such  as  destination  address  recognition)  performed  'on  the  fly’  in  the  optical  domain. 

A  key  requirement  for  high  performance  is  that  the  routing  mechanisms  and  processing  at  network  nodes  should  be  as  simple  as  possible.  Here  we 
describe  a  new  strategy  for  routing  in  regular  mesh  interconnection  networks,  based  on  a  method  of  automatic  global  scheduled  ('clockwork') 
switching  in  the  optical  domain.  Using  this  strategy,  the  intermediate  routing  nodes  are  merely  needed  to  perform  an  extraordinarily  simple  function 
(*for-me-or-not-for-me’  header-address  recognition),  otherwise  traffic  is  routed  onwards  automatically  in  Ae  optical  domain  with  absolutely  no 
further  intelligent  action  performed  by  the  node.  The  throughput  is  comparable  with  conventional  store-and-forward  packet  switching,  yet  the 
simplicity  of  this  strategy  makes  it  suitable  for  implementation  in  digital  optical  logic.  The  clockwork  approach  enables  some  special  capabilities- 
such  as  ultra-low  latency  signalling,  bandwidth  reservation,  ultra-low  response  delay,  and  process  scheduling. 

11:00am  -  11:30am 

2.6  Large-scale  photonic  packet  switch  using  wavelength  routing  techniques,  Koji  Sasayama,  NTT  Network  Innovation  Laboratories, 
Kanagawa,  JAPAN 

This  talk  describes  the  large-scale  photonic  packet  switching  system  being  developed  in  NTT  Laboratories.  It  uses  wavelength-division-multiplexing 
(WDM)  techniques  to  attack  Tbit/s-class  throughput  The  architecture  is  a  simple  star  with  modular  structure  and  effectively  combines  optical  WDM 
techniques  and  electronic  control  circuits.  Recent  achievements  in  important  key  technologies  leading  to  the  realization  of  large-scale  photonic 
packet  switches  based  on  the  architecture  are  described.  It  is  confirmed  that  a  320-Gbit/s  system  can  tolerate  the  polarization  and  wavelength 
dependencies  of  optical  devices.  Experiments  using  rack-mounted  prototypes  demonstrate  the  feasibility  of  the  architecture.  The  experiments  showed 
stable  system  operation  and  high-speed  WDM  switching  capability  up  to  the  total  optical  bandwidth  of  12.8  nm,  as  well  as  successftil  10-Gbit/s  4x4 
broadcast-and-select  and  2.5-Gbit/s  16  x  16  wavelength-routing  switch  operations. 

11:30am  -  12:00noon 

2.7  Latency  and  Scaling  Issues  in  High-Speed  Optical  TDM  Networks,  Paul  R  Prucnal,  Princeton  University,  Princeton,  NJ 

An  overview  of  optical  TDM  devices  and  techniques  for  ultra-high  bit  rate  data  communications  is  given  as  well  as  a  discussion  of  the  latency  and 
scaling  issues  present  in  these  systems. 


12:00pin  >  l:30pin  Luncheon  and  Working  Group  Session  m 
l:30pni  -  3:30pm  Working  Group  Session  IV 
3:30pm  -  7:00pm  Free  Afternoon 
7:00pm  -  8:00pm 

Session:  Optoelectronic  and  Optical  Technologies 

Session  Chair:  Marc  Christensen,  George  Mason  University,  Fairfax,  VA 

7:00pm  -  7:30pm 

2.8  The  Commercial  Applications  of  Optoelectronics,  A  View  from  the  Optoelectronics  Industry  Development  Association  (OIDA), 
Arpad  Bergh,  OIDA,  Washington,  DC 

The  Optoelectronics  Industry  Development  Association  (OIDA)  was  formed  in  1991  to  advance  the  worldwide  competitiveness  of  the  North 
American  optoelectonics  industry  and  to  promote  the  application  of  optoelectronics  technology.  The  OE  industry  is  a  collection  of  six  ore  more 
distinct  industries  that  all  depend  on  OE  technology.  This  fragmentation  represents  major  challenges  and  opportunities. 

It  is  difficult  to  draw  a  technology  roadmap  that  serves  all  applications.  On  the  other  hand,  there  are  great  opportunities  to  share  a  common 
infrastructure  that  can  advance  a  number  of  non-competing  industries.  Over  the  past  eight  years  OIDA  had  carried  out  over  thirty  market  survey  and 
technology  roadmap  activities  to  identify  emerging  markets  and  shortcomings  in  domestic  technology.  Industiy  wide  consensus  was  developed 
through  informal  interactions  and  recommendations  were  presented  to  industiy  and  government  for  action. 

The  most  prevalent  impediments  identified  in  these  studies  are  the  exploration  of  new  markets  for  OE  enabled  applications  and  the  ability  to  conduct 
high  volume,  low  cost  manufacturing.  This  talk  will  describe  some  of  the  initiatives  that  OIDA  has  undertaken  to  overcome  these  deficiencies. 

7:30pm  -  8:00pm 

2.9  Board  and  Back-plane  Level  Optical  Circuits  Using  Integrated  Thin-cladding  Polymer  Fibers,  Yao  Li,  Jan  Popelek,  and  Jun  Ai, 

NEC  Research  Institute,  Princeton,  NJ 

This  talk  summarizes  recent  research  activities  at  NEC  Research  Institute  on  optical  interconnections  using  integrated  polymer  fibers.  The  objective 
of  the  research  is  to  study  large-bandwidth,  short-distance,  packageable  optical  solutions  to  address  future  interconnection  needs  at  circuit  board  and 
back-plane  levels.  We  have  studied  possibility  of  using  embedded  polymer  fibers  to  form  a  10  GHz  board-level  optical  clock  distribution  circuit  and 
demonstrated  the  feasibility  of  highly  efficient  and  uniform  delivery  scheme  for  up  to  128  optical  termination's.  Specialty  thin-cladding  polymer  fiber 
bundles  were  integrated  into  convention  PCB’s.  Various  performance  data  will  be  presented.  We  also  extended  this  embedding  concept  to  include 
polymer  fiber  image  guides  (PFIG’s),  a  cost-effective  2D  image  transmission  components.  We  have  fabricated  some  packaged  and  connectorized 
board-level  optical  circuits  to  perform  point-to-point  2D  parallel  optical  interconnects  for  future  2D  vertical-cavity  surface-emitting  laser  (VCSEL) 
and  optical  detector  array  based  optical  interconnects.  Among  demonstrated  are  some  16  node  (6x6  bits/node)  optical  shuffle  and  butterfly 
interconnect  circuits  using  three-layers  of  FIG  embedding.  Low  insertion  loss  (<  2  dB)  ad  moderate  resolution  (11  Ip/mm)  were  obtained.  To  further 
extend  the  capability  of  these  2D  parallel  optical  circuits,  we  are  experimenting  a  hybrid  integration  of  these  PFIG's  and  free-space  micro-optic 
components  so  that  branching  and  add/drop  capabilities  at  different  optical  nodes  can  be  incorporated. 

8:00pm  -  8:30pm 

2.10  Development  of  Monolithically  Integrated  Transceivers  for  Single-and  Multi-Channel  Fiber-Based  Optical  Interconnects, 

Clifton  G.  Fonstad,  Jr.,  and  Joseph  F.  Ahadian,  Massachuestts  Institute  of  Technology,  Cambridge,  MA 

The  Epitaxy-on-Electronics  (EoE)  optoelectronic  integration  technology,  in  which  optoelectronic  device  heterostructures  are  grown  epitaxially  on 
fully  processed  GaAs  MESFET  electronic  circuits,  has  produced  uniquely  complex  monolithic  OEICs  combining  optical  emitters  and  detectors  with 
high-speed  VLSI  electronics.  This  paper  describes  the  development  of  transceivers  for  fiber-based  optical  interconnnects  using  the  EoE  technology. 
Recent  progress  toward  implementing  the  EoE  technology  wiA  silicon  CMOS  electronics  and  more  advanced  GaAs  technologies  will  also  be 
reviewed. 


WeDNESDAY,  12  MAY  1998 


8:00ani  -  9:30am 

Session:  Working  Group  Solution  Presentations 

Session  Chair:  Ashley  Saulsbury,  Sun  Microsystems,  Mountain  View.  CA 

9:30am  -  9:45am  Coffee  Break 

9:45am  -  11:15am 
Session:  Plenary 

Session  Chair:  Philippe  Marchand,  UCSD.  La  Jolla,  CA 

9:45am  -  10:30am 

3,1  Java,  Jini  and  High  Speed  Systems  of  the  Future,  Bill  Joy,  SUN  Microsystems,  Aspen,  CO 

Until  roughly  1980,  high  performance  systems  were  built  of  multiple  boards,  and  organized  around  a  disk  operating  system.  Sun's  Solaris  and 
SPARC  based  products  have  this  form,  and  applications  like  Oracle  and  SAP  are  focused  around  management  of  the  information  on  the  disk. 

With  the  emergence  of  the  internet  in  the  last  20  years,  systems  are  more  and  more  often  built  on  networking  as  the  interconnect,  often  now  with 
TCP/IP  playing  the  role  that  the  disk  operating  system  did,  providing  the  basic  interconnect  primitives.  Sun's  work  with  AOL  and  Netscape  is  an 
example  of  a  major  activity  for  us  in  this  area,  defining  e-commerce  as  services  relative  to  the  interconnect. 

In  the  future  we  believe  that  there  will  be  a  third  organization  for  computing  ^sterns,  those  organized  around  objects  and  agents.  We  have  built  the 
Java  and  Jini  technologies  to  support  these  new  kinds  of  systems. 

This  talk  will  discuss  these  three  organizing  principles  for  computer  systems  (disks,  internetworks  and  objects)  and  the  implications  for  systems 
design. 

11:15am  -  11:30am 
Workshop  Summary 

Matthew  Goodman,  Bellcore,  Red  Bank,  NJ 


Sunday, 

9  May  1999 


1999  IEEE  Workshop 

Interconnections  within  High-Speed  Digital  Systems 


3:00pm  -  3:45pm 
Sun,  9  May  -  Tutorial  I 


Device  and  Interconnect  Technologies 
for  ~  100  GHz  mixed-signal  iCs 


Mark  Rodweii 

University  of  California,  Santa  Barbara 


roclwell@ece.ucsb.edu  805-893-3244,  805-893-3262  fax 


Device  and  Interconnect  Technologies 
for  ~  100  GHz  mixed-signal  iCs 

Two  topics: 

iCs  *for*  high-frequency  interconnects 
RF/wiretess,  opticai  fiber 
iCs  *  needing*  high-frequency  interconnects 
100  GHz  digitai logic,  GHz  ADCs/D ACs 

The  organization: 

what  are  the  future  applications  ? 
what  are  the  requirements  ? 
what  is  the  state  of  the  art  ? 
challenges  for  future  high  speed  iCs 
. . .  and  how  my  group  is  attacking  them 


Transceivers:  very  fast  digital  &  mixed-signal  iCs 

Interfaces:  very  wideband  analog  circuits,  optoelectronics,  mm-wave  power 
Switches:  - 10  GHz  fast  complex  digital  ICs 


RF/Microwave  ADCs/DACs/DDS:  Towards  the  “Software  Radio” 


Transceiver.  Today 


1 

I 

ft 

H] 

wm 

ft 

m 

ft 

presetect/RF  conversicyn  modulation/  decision  I/O 
coding 


Digital  Modulation  Transceiver 


Advantages 

digital  robustness,  frequency  agility 
complex  modulation  (spread  spectrum...), 
simultaneous  operation  on  many  bands 

Challenges 

ADC  needs  **enormous**  dynamic  range 
DAC  also  must  have  very  high  SNR 
Enormous  data  reduction  in  DSP: 
very  complex,  very  fast  logic 


All  Digital  Transceiver 


convert  ,n  OSP  ^ 

c^rsion  c^version  coding,  decision  in  DSP 


High  resolution  ADCs/DACs  need  high  iC  speed 

Delta-Sigma  (noise  shaping)  ADC 

1^suml4TlTTlHADC|-, -  fe  I 


Interpolating  (noise  shaping)  DAC 
clock 

+  MSB, - , 

^ysum^  ^DAC^ 


Frequency 


Frequency 


ADCs/DACs  for  radio: 

high  dynamic  range  required  (10-18  bits) 

Oversampling  ADCs/DACs: 

high  resolution  obtained  through  high  oversampling 

Microwave  ADCs  need  very  fast  logic,  very  fast  transistors 


Requirements:  100  GHz  clock-rate  logic 


Fast  transistors: 

ADCs  etc  need  very  high  ratio  of  transistor  to  signal  bandwidth 
High  performance  wiring : 

millimeter-wave  bandwidths  with  analog  &  digital  signals  ! 
microstripAmes  and  ground-planes  for  signal  integrity 
power  delay  products  and  impact  of  wiring 

Outstanding  heatsinking: 

clock  rates  will  be  very  high,  so  wiring  delays  must  be  small 
transistors  must  be  close  together ! 
high  performance  transistors  use  high  power  densities  ! 
power  density  on  die  may  approach  1  kW/cm^  ! 


Bandwidth  of  Bipolar  Transistors 


■\(r 


A^V- 


T 


-If 


’*‘**1^ 


fr  = 


{l/2n) 


base  '^collector 


■{Cj^kTlqi:) 


fn^  = 


i 


8-ttR.tC^ 


/x  .  ^3X  .  and  Qfa  are  all  important 
for  high  -  speed  circuits 


and  R^iCd,  must  be  reduced 


Current-gain  cutoff  frequency  in  HBTs 

emitter 


■^601*  collector 


Collector  velocities  can  be  high:  velocity  overshoot  in  InGaAs 
Base  bandgap  grading  reduces  transit  time  substantially 
RC  terms  quite  important  for  >  200  GHz  ft  devices 


Fmax  in  Double-Mesa  HBTs 


emitter 

base  contact  ■■i  base  contact 


^bb  P  contact  ^horizontal 

sL 


Scaling  emitter  width  does  reduce 
base  spreading  resistance. 

-  but — 

Minimum  base  resitance  set  by 
base  contact  resistance. 

Minimum  collector  capacitance 
set  by  minimum  base  contact  size 


Fmax  in  Transferred-Substrate  HBTs 


collector 

contact 


^bb  P  contact , horizontal 


zL 


^shee±  jj/’ 

12L 


Rj^ijCcb  reduces  rapidly  with 
deep  submicron  scaling 

Component  due  to  contacts 
scales  as 

Base  spreading  component 
scales  as 


Fmax  increases  rapidly  with  deep  submicron  scaling 


Transferred-Substrate  HBTs:  A  Sca/a£>/eHBT  Technology 
1000 


M 

X 

o 


800 


soon 


400 


200  H 


Transferred- 
Substrate  HBT 


1 .0  ytan  base  Ohmics 


0  0.5  1  1.5 

emitter  width,  microns 

■  Collector  capacitance  reduces  with  scaling;  <« 

•  Bandwidth  increases  rapidly  with  scaiing;  oe  ^1/W^ 


Transferred-Substrate  HBT:  Stepper  Lithography 


Mason’s 
.  gain,  U 


e  15- 

'co 

O 


H  --  f  =  147  GHz 

21  i 


f  =805  GHz 

' .  max  I 


100 

Frequency,  GHz 


0.4  |xm  emitter,  -0.7  |xm  collector 


Proposed  THz-Bandwidth  HBT  ? 

deep  submicron  ^  2 

transferred-substrate 

regrown-base  HBT 


1)  regrown  P-M-+  InGaAs  extrinsic  base  ~>  ultra-low-resistance 

2)  0.05  pm  wide  emitter  ->  ultra  low  base  spreading  resistance 

3)  0.05  pm  wide  collector  ->  ultra  low  collector  capacitance 

4)  100  A,  carbon-doped  graded  base  ~>  0.05  ps  transit  time 

5)  1kA  thick  InP  collector  ->  0.1  ps  transit  time. 

Projected  Performance: 

Transistor  with  500  GHz  ft,  1500  GHz  fmax 


S  Ii.-r 


s  h  k h 


Ground  Bound  Noise  in  ADCs 


ground 
ounce 
noise 


Ground  bounce  noise  must  be  ~100  dB  below  full-scale  input 
Differential  input  will  partly  suppress  ground  noise  coupling 
~  30  to  40  dB  common-mode  rejection  feasible 
CMRR  insufficient  to  obtain  100  dB  SNR 

Eliminate  ground  bounce  noise  by  good  iC  grounding 


Microstrip  iC  wiring  to  Eliminate  Ground  Bounce  Noise 


Brass  carrier  and 
assembly  ground 


interconnect 

substrate 


IC  with  backside 
ground  plane  &  vias 


near-zero 

ground-ground 

inductance 


IC  vias 

eliminate 

on-wafer 

ground 

loops 


Transferred-substrate  HBT  process  provides  vias  &  ground  plane. 


The  microstrip  via  inductance  problem 
D  _  s 


Si  ku 


wmm 


12  pH  via  inductance  for  100  micron  MIMIC  substrate 
j7.5  Ohms  at  100  GHz,  j15  Ohms  at  200  GHz 
A  formidable  difficulty  for  >100  GHz  IC  design 
At  100  um  substrate  thickness,  via  spacing  must  be  >  100  um 
Solutions  include  “masterslice”,  flip-chip,  substrate  transfer 


Deep  submicron  HBT  logic:  low  power  ? 


Device  sized  for  100  £2  load: 
(200  mV  ECL  logic  swing) 
0.15  |xm  X  6  }xm  emitter 
peak  speed  at  2  mA  bias 

Shorter  stripe  length  device: 
0.15  pm  X  0.5  pm  emitter 
peak  speed  at  ISOjuA  bias 


Small  device  is  low-power  but  cannot  drive  1 00  £2  line. 

drives  line  with  mismatched  impedance:  capacitance 
lower  power  at  higher  (wiring-limited)  delay 

fast  low-power  logic  requires  low-voltage-swing-logic 


Power-delay  product  in  Interconnect  limit 

Tpro,  =  (1  /  2  )C^,.  V,,  A 

bipolar  logic  (static  power) 

rgate  !  f  clock  ~  (^  ^  2  )C  A 

CMOS  logic  (dynamic  power) 


^prop  f clock  ~  number  gates  between  latches 


For  fast,  low-power  logic:  reduce  Vcc^^io^ 


Fiber  Optic 
iCs 

not  yet  tested 
(design  40  Gb/s) 


AGC / limiting  amplifier 


PIN /transimpedance  amplifier 


CML  decision  circuit 


Delta-Sigma  ADC  (300  HBTs) 


Fast  ICs  for  fast  interconnects 
Fast  ICs  needing  fast  interconnects 


ICs  for  GHz  communications: 

Optical  fiber  transmission  to,  beyond  40  Gb/s 
with  eiectronic  data  switching 
miilimeter-wave  (60/90/1 80  GHz)  wireless  networks 
at  mm-wave,  bandwidth  is  cheap  &  plentifui 
...but  the  hardware  must  become  cheap 
ADCs,  DACs  for  digital  processing  of  RF  signals 

Challenges  for  fast  ICs 

Fast  transistors:  scaling  is  key 

Wiring  environment:  signal,  ground  and  power  integrity 
Interconnect-iimited  power-delay  products 
Managing  high  dissipated  power  densities 


_  4:45pm  -  5:30pm 

MVJSESeMSOKSCOMrOttATIOS  • 

Sun,  9  May  -  Tutorial  3 


ADVANCES  IN  CHIP  LEVEL  PACKAGING 
John.  C.  Carson 


Irvine  Sensors  Corporation 
3001  Redhill  Avenue  Bldg  3 
Costa  Mesa  CA  92626 


Irvine  Sensors  Corporation 


DIFFERENT  TRENDS  IN  PACKAGING 


(fiom  R.  Heitrnann,  Uoivasal  Insmanents  Inc.  -  EPP  Msy 


r~  nyiUtSENSOMSCOIVOMArTON 


Exceipts  from  SIA  Roadmap  for  Cost/Performance  Category 


1995-2000 

2000-2005 

2005-2010 

feature  size  (pm) 

0.35-0.25 

0.18-0.13 

0,10-0.07 

transistors/cm^ 

4M-7M 

13M-  25M 

50M-120M 

pin  count 

300-1000 

1200-2000 

2400-3600 

package  thickness  (mm) 

1.0- 2.0 

1.0 

0.5  -  1,0 

package  cost  (cents/pin) 

1.4-4 

1-2 

0.6-1. 3 

package  size  (mm) 

23^5 

29-50 

35-50 

lead  pitch-  peripheral  (mm) 

0.3-  1 

0.3-0.65 

0.3-0.5 

lead  pitch  -  array  (mm) 

1.0-  1.5 

1.0 

0.5-  0.65 

power  (W) 

2-18 

2-28 

2-55 

Performance  (MHz) 

100-200 

200-400 

400-1000 

Irvirte  Sensors  Corporation 


—  atvisesDaoMsctutroMATioN 

TRENDS  OBSERVED 

•  Packages  are  getting  thinner 

•  Number  of  leads  is  getting  larger 

•  Package  footprint  decreasing  to  approach  chip  size 

•  Direct  Chip  attach  techniques  are  emerging 

•  Package  pin  pitch  is  decreasing 

•  Package  pins  are  being  distributed  in  array  format 

HOWEVER,  SUPPORTING  SUBSTRATES  ARE  BECOMING  MORE 
AND  MORE  LIMITING 

SYSTEM  DESIGNERS  ARE  SEEKING  SOLUTIONS  IN  MULTI-CHIP 
AND  3D  PACKAGING  TECHNIQUES  ALONG  WITH  SYSTEM-IN-A- 
CHIP  APPROACHES 


Ifvina  Seosof* CorporalJon 


bvine  S«n9ora  Corporation 


trvine  Sensors  Corpontion 


FOOTPRINTS  OF  CHIP  SCALE  AND  CONVENTIONAL 
PACKAGES 


Lead  count 

Conventional 

TSOP 

Fine  Pitch 
QFP 

CSP 

28 

152 

56 

11 

32 

168 

63 

19 

52 

169 

81 

31 

64 

169 

99 

36 

Areas  in  mm2, 

Adapted  from  P.  Thompson  (IEEE  Spectrum  August  1997) 

Uf  tiiiUrt^wfc.AUrittwwoii  IfVine  Sensors  CotpoTBlion 


*“  attmEsaaoMSCOitroKATJOff 

TYPES  OF  CHIP 

die  Die  bumps 

fill 

SCALE  PACKAGES 

die  Dje  bumps 

Ceramic  or 

Solder  balls  ^  organic 

interposer 

RIGID4NTERPOSER  TYPE 

NUCSP,  Ceramic  fine  pitch  BGA,  mini  BGA 
SFFP,  Stud  bump  bond,  SLIC,  Flip-chip  BGA 
Chip  size  thin  package 

fill 

Film  interposer 

Solder  balls  (polyimide,  tape) 

FLEXIBLE-INTERPOSER  TYPE 
Chip-on-flex,  Flip-chip  BGA,  JACS-PAK,  Fine 
pitch  BGA,  Resin  Molded  CSP,  FBGA,  pBGA, 
ustar  BGA 

die^  wirebond 

post 

WAFER  LEVEL  (MICRO  SURFACE  MOUNT) 

Micro  Surface  Mount,  SlimCase,  Mini  BGA 

Lead  frame  Molding  compound 

CUSTOM  LEAD  FRAME 

Small  Outiine  Nolead .  Lead  on  Chip,  )istud  BGA, 
Bottom  Lead  Package,  Molded  bump.  Very  Small 
Peripheral  Array,  Flip-T ape  Carrier 

((Vine  Sontots  Cotpocstion 


j -  ntmKSXMOKSCOKrOKATION  - - 

PACKAGES  ARE  GETTING  THINNER 


1.2  mm  TSOP 


0.5  mm  Dual  In  Line  Tape  Carrier 


0.4  mm  Ultra  Thin  Chip  Package 


0.3  mm  Slim  Case  (from  ShellCase) 

•  Thin  packages  can  be  made  only  100  microns  larger  than 
the  die  in  length,  width  and  height 

•  Note  that  the  incoming  wafer  thickness  is  0.6  >  0.7  mm 

•  The  die  in  thinner  packages  is  in  the  range  of  100  microns 
(0.1  mm)  thin 

•  Advanced  die  thinning  techniques  required  to  support 
packaging  development 


i-AIlritfibfwrMd 


Irvine  Sensors  CotporsUon 


U^^abcd  ««1c  •  AU  r«to  iwiwd 


Irvine  S«nson  Corporation 


MVISESZHSOMS  COKTOM  TtOK/ 

EXAMPLE  OF  CHIP  SCALE  PACKAGES 


.  —  ■ — 

MSMT 

mjorr 

fopcaia. 

^-.1 

\.A. 

fmBi 

o  mo? 

i.-r 

0  <  roO  2 

j 

o  os  woo? 

0.9  to  1 

05 

Solder 

Solder 

Solder  ead 

Uodoisn 

Micro  Grid  Array  ™  (MGA) 


Micro  Surface  Mount 
technology  (MSMT) 
package  from  ChipScale 
inc. 

The  package  is  formed  in 
wafer  form 

MSMT  yields  active 
surfoce  of  the  chip  away 
from  PCS 

MGA  technology 
produces  packages  with 
active  face  down  to  PCB 


MGA  provides  a  standoff  from 
chip  surlace  by  using  a  contpliant 
epoxy  and  can  be  placed 
anywhere  on  the  chip  using  wafer 
level  processing 


Ifvino  Sertsors  Corporatkxt 


ntyTNxxensoMscoKmtAnoft 


EXAMPLE  OP  CHIP  SCALE  PACKAGES 


OtfmESENSOrnSCOA^OiUTJON 

EXAMPLE  OF  CHIP  SCALE  PACKAGES 
UKrathin  CSP  from  ShellCase 


UupbHIh  1  wrk  -  Ail  ritUtt  rtrviid  Irvine  S«raors  Coqpontkxi 


COMPARISON  OF  CONVENTIONAL  AND  WAFER  LEVEL  CHIP 
PACKAGING 


Traditional  1C  Packaging 

Wafer  Level  Packaging 

Wafer  is  probed,  diced  and  sorted 

Wafer  moved  directly  to  packaging 

ICs  packaged  away  from  fab 

ICs  packaged  in  fab 

ICs  are  packaged  one  at  a  time 

ICs  are  packaged  en  masse 

Bum  in  performed  in  sockets 

Bum  in  performed  on  vrafer 

Power  and  ground  taken  from  PCB 

Power  and  ground  distributed  in  assembled  structure 

Device  tested  2-3  times 

Device  tested  once 

High  pin  counts  required 

Lower  external  I/O  possible 

Higher  power  required  1 

Reduced  power  requirements 

All  function  in  the  chip  | 

Function  shared  between  package  and  chip 

More  complex  substrate  required  | 

Simpler  substrates  possible  (lower  I/O) 

Lead  inductance  concerns  | 

Lead  inductance  nearly  eliminated 

Source;  J.  Fjelstad,  Tessera  Inc. 


nnuxansonscottPOKA-noft 


SUPPORTING  TECHNOLOGIES  FOR  ADVANCED  PACKAGING 

Advanced  Packaging  requires  the  utilization  of  the  following 
techniques  extensively  : 

•  thinning  of  silicon  wafers  containing  circuits 

•  bump  bonding  for  high  I/O  density  interface 

•  handling  of  KGD  in  die  form 

•  handling  of  die  of  different  sizes  and  origins,  non-electron ic 
chips  (e.g.  MEMS,  Lasers,  Detectors,  Fluidic  Devices) 

Therefore,  advances  in  these  techniques  will  help  to 
increase  the  density  and  the  functionality  of  advanced 
packages 


UapiMukcdmfc.AiiriifaiiiiMnwi  IrvinA  S«n5ors  Corporation 


ULTRA-THIN  SILICON  CIRCUITS 

•  A  Kapton  (50  micron  thick)  based  flexible  test  vehicle  has  been 
used  to  test  ultra-thin  flash  die 

•  25  micron  thin  16  Mb  Flash  die  has  been  successfully  tested  after 
mounting  on  the  test  vehicle 

•  25  Micron  thin  memory  die  mounted  on  the  flexible  substrate  is 
bendable  with  the  substrate.  A  bending  radius  of  1  mm  can  be 
obtained  for  each  micron  of  silicon  thickness 


iM|M*ijik«iMi4<.AUn«kDrMrwr  Inrine  Sonaors  Corporation 


TECHNOLOGIES  AND  PRODUCTS  BASED  ON 
THIN  SILICON  CIRCUITS 


•  thin  integrated  circuit  stacks  with  higher  capacity  (10  times  more) 

•  flexible  circuitry  in  conformal  packaging 

-  medical  applications  (shape  conforming  sensors) 

-  space  applications  (SOI-like  advantages) 

-  wearable  products 


—  smart  cards 


18  layers  of  20  micron  thin  Si 

stack  compared  to  ISC’s  ^i»dard  -smart  Band  AicT  for  body 
short  stack:  within  same  height.  function  monitoring 
the  new  thin  stack  can 
accommodate  10  times  more 
layers 


Bendable  circuits  for 
space  microprobes, 
medical  microprobes  and 
smart  pro|ectiie$ 


Seraon  Corporation 


A  revolutionary  chip^level  layering  and  stacking  concept  to 
eliminate  the  same-size  restriction  and  wafer  level  inventory 
requirements  of  existing  processes 


The  process  is  designed  to  re-create  a  wafer  from  individual  and  heterogenous 
chips  for  batch  processing  by  embedding  them  into  an  epoxy  frame 
After  lithography  and  metalization,  the  wafer  vnll  be  diced  into  neo-die  of  identical 
sizes  that  contain  each  layer  to  be  stacked 

Mature  stacking  technology  and  tools  will  be  used  to  stack  many  layers  and 
interconnect  layers 


Irvine  Sensors  Corporation 


IMVlffESENSOKSCOiavltATJOft 

NEO  STACKING  APPROACH 

•  starting  with  KGD,  construct  a  new,  or  neo-wafer  with  many  dice  in 
a  molding  compound  matrix 

»  Use  a  standard  neo-die  size,  just  slightly  larger  than  the  largest  die 
in  the  stack 

•  Add  blank  silicon  to  open  areas  on  layers  where  smaller  die  are 
used  to  enhance  thermal  conduction  between  layers  if  needed 

»  Perform  metalization  and  thinning  in  neo-wafer  form 

•  Dice  into  individual  layers 

»  Laminate  into  a  stack 

Neo-stacking  is  a  breakthrough  in  high  density  packaoino  technology 

•  It  allows  complete  systems  in  a  cube 

•  It  allows  the  combination  of  massive  electronic  functions  with 
extreme  miniaturization  and  integral  logic  and  control  functions 

•  dense  layer-to-layer  interconnects  through  the  epoxy  molding  layer 

•  The  process  is  highly  manufacturable  through  industry  standard 
automated  tooling  and  batch  processing 


L-AUrintinecrvMi  ifvtne  S«rtaont  Corporation 


iMw*esessoMsco»fCMAr70N 


NEO  STACKING  FABRICATION  EXAMPLES 


Potting  Compound  Boot  Flash  Memory  Chip 


Ifvtne  S^nsofs  Cocpof9tion 


Ifvm  Sonsors  Corporation 


HIGH  SPEED  OPERATION  IN  CHIP  STACKS 


•  ALCATEL-Espace 
demonstrated  the  basic 
operation  of  stacked 
microwave  circuits  in  Ku- 
band  (10.7  -12.  7  GHz) 
range 

•  Insertion  losses  were 
about  -0.5  dB/mm 

•  Results  indicate  potential 
operation  up  to  30  GHz. 


From:  3D  Microwave  Modules  for  Space  Applications  ,  P.Monfraix  et  al 


Irvino  Sensors  Corporation 


CHIP  LEVEL  3D-PACKAGING  ROADMAP 


YEARS 

1998 

2000 

2002 

2004 

ln>plane  line  density 
(iines/cm) 

500 

1000 

1500 

2000 

ln>plane  total  number  of 
metalization  layers 

2 

3 

4 

5 

Side-face  line  density 
(lines/cm) 

200 

400 

800 

1000 

Side  face  total  number 
of  layers 

1 

2 

2 

3 

Areal  line  density  (new 
technology)  (Iines/cm^) 

900 

1600 

2500 

5000 

1 

2 

4 

8 

10 

20 

30 

50 

Irvine  Sensors  Corporation 


—  avmzsgfaoKscoivOMnoN  ^ 

ELECTRICAL  CHARACTERISTICS  OF  TYPICAL 
PACKAGE  INTERFACES 


100  mm^  chip 

Bare  die  with 

75  pm  Wire 

bond 

Flip-chip 

0.5mmbunq) 

Quad  Flat  Pack 

witii75  pm 

wire  bond 

McroBall 

Grid  array 

1  mmbun^) 

pitch  (mm) 

0.15 

0.25 

030 

0.50 

Footprint  (mm^) 

125 

125 

785 

150 

Package/chip  area 

1.25 

1.25 

7.85 

1.5 

Hei^  (mm) 

0.4-0.6 

0.5-0.7 

1.4 

0.84 

Inductance  (nH) 

1-2 

0.05-0.2 

1-7 

0.5-2.1 

C^Mcitance  (pF) 

0.2 

0.05-0.1 

0.5-1 

0.05-03 

IMpaWiMmlr-AUrltkisnwriwI  IrvIne  ScnSOTS  COfpOOtion 


COMPARISON  OF  ADVANCED  PACKAGING  APPROACHES 


UnfMktiited' 


Jrvirw  S«mors  Corporsbon 


5:30pm  -  6:1 5pm 
Sun,  9  May  -  Tutorial  -4 


SAMPLING  FREQUENCY 


How  frequent  should  we  sample  for  SNMP? 


•  The  sampling  frequency  determines  the  resolution  of  the  data  and 
Its  storage  requirements. 

•  Larger  sampling  intervals  result  in  smoother  data  summaries  and 
hide  the  variation  between  samples. 

•  Network  performance  statistics  may  have  a  periodic  component  If 
so,  the  data  sampling  period  should  be  less  than  half  of  that  The 
data  period  should  not  be  divisible  by  the  sampling  period, 
othervrise  the  samples  vrill  consistently  be  taken  at  peaks, 
midpoints  or  low  points  of  the  data. 

•  When  using  SNMP  to  measure  Intemet/Intranet  performance,  do 
not  let  network  management  traffic  swamp  regular  traffic. 
(Heisenberg  Uncertainty  Principle  :  We  disturb  the  object  we 
measure  ~  the  more  precisely  we  try  to  measure  it  the  more  we 
disturb  the  object) 


•  NumberOfSamplingPoints 

•  PacketsPerSample 

•  TimeBetweenSampIes 

•  SizeOfPacket 

•  MediaSpeed 


=  1000 
=  2 

=  60  seconds 
=  85  bytes 

=  256,000  bits/second 


SampleRate  = 


NumberOfSamplingPoints  x  PacketsPerSample 
TimeBetweenSampIes 


SampleRate  =  33.33  Samples/Scconds 

.  SampleRatex  SizeOfPacket  x  8  ^ 


MediaSpeed 


Utilization  =8.854 


TRAFFIC  CHARACTERISTICS 

•  LAN  Analyzers  .Expert  Sniffer  ,  RMON  MIB  ,  TCPDump  t^ll 
provide  information  about  packet  size  distributions.  When  in  doubt 
about  the  probability  distribution  function  assume  exponential. 

•  Packet  inter-arrival  times  can  also  be  obtained  using  LAN 
Analyzers  or  sniffers.  You  may  use  exponential  distribution  when 
you  do  not  have  a  better  model  for  the  inter-arrival.  Exponential 
distribution  will  not  fit  to  all  data.  For  example  ;  RIP  (routing 
information  protocol)  sends  Ks  routing  tables  on  all  interfaces 
every  30  seconds.  While  the  client  requests  may  be  exponentially 
distributed  the  responses  may  be  fixed  sized  closely  spaced 
packets. 

•  Business  cycle  defines  how  the  average  packet  rate  fluctuates  and 
the  peak  value  should  be  used  for  performance  an^ysis. 

•  The  average  packet  arrival  rate  and  the  average  packet  size 
should  be  multiplied  for  link  bandwidth  selection. 


SNIFFER  TRACE 


SNIFFER  TRACE 


BENCHMARKING 

a  Before  a  new  application  is  deployed  on  a  network  benchmark 
data  should  be  collected  to  predict  the  impact  of  the  new 
application  on  the  network  resources. 

•  Questions : 

-  Is  enough  data  being  collected  to  provide  a  statistically 
valid  sample? 

-  Will  the  real  network  actually  experience  the  type  of  the 
traffic  being  measured? 

a  While  collecting  benchmark  traffic  data  for  an  application  the 
af^lication  should  run  on  an  isolated  segment  Otherwise 
contamination  will  occur  by  the  data  not  participating  in  the 
benchmarking  study. 

a  Measurements  taken  on  one  type  of  network  (Ethernet  for 
example)  does  not  apply  to  other  networks  (Token  ring). 


BENCHMARKING 
Sources  of  contamination 

•  Taf^et  and  measurement  media  are  different  (Differences  such  as 
MTU  ,  framing  characters  etc.  should  be  compensated) 

•  Network  operating  systems  are  dissimilar. 

•  Applications  are  similar  but  not  identical. 

•  Queues  existed  at  the  servers  and  network  links  at  the  time  of 
measurement 

•  Disk  I/O  delays  are  lumped  in  with  CPU  instruction  delays. 

•  The  LAN  carried  other  traffic  not  related  to  benchmarking. 

•  Measurements  were  taken  remote  from  the  the  server  system. 

•  The  server  is  doing  other  work  in  addition  to  our  application. 

•  The  benchmark  network  and  systems  are  already  highly  utilized. 

•  £ven  if  the  application  will  be  deployed  over  a  wide  area  network  with  a 
number  of  remote  clients,  benchmarking  data  still  should  be  collected 
over  a  LAN  system  with  ONE  client  and  server  on  the  same  LAN. 


NETWORK  DESIGN  PROCESS 


Design 

Paramaters 


Design  inputs 


Required  response  time 

Current  and  planned  services  and  applications  to  be 
supported 

Anticipated  future  services  and  application  needs 
Communication  protocols,  including  network 
management  protocols  to  be  supported 
Network  management  functions  to  be  supported 
Reliability  Requirements 
Implementation  and  maintenance  budget 


DESIGN  PARAMETERS 

What  type  of  traffic  will  be  carried  ? 

Is  the  data  to  be  carried  time  sensitive?  If  yes ,  what  are  the  delay 
and  delay  jitter  requirements? 

Is  the  data  bursty  in  nature  ?  Will  smoothing  affect  application  ? 
What  are  the  acceptable  bit  error  rates  and  packet  loss  rates? 

Is  the  traffic  symmetric  ( such  as  in  videoconferencing)  ? 

Are  there  any  applications  that  will  benefit  from  multicasting  or 
broadcasting  ( such  as  digital  video  broadcasting)  ? 

What  are  the  reliability  requirements  ?  What  are  the  tolerable  times 
for  recovery  In  case  of  failures? 

What  are  the  security  requirements  ? 

What  protocols  will  be  supported  ? 

Total  network  budget  and  the  percentage  reserved  for  network 
management,  analysis  and  data  collection  tools. 

>  Self  similarity  of  network  sites  for  reducing  operating  costs  versus 
initial  deployment  costs. 


WAN  DESIGN 


Type  of  service  decisions 

•  Frame  Rday 

•  ATM 

•  DDS  Lines 
-  SMDS 

•  Private  versus  public  leased  lines 

•  etc. 

Topology  decisions 

•  Star 

•  Backbone 


Link  and  Node  selection  and  sizing 


LAN  DESIGN 

Topology  decision  (Bus,  Ring,  Tree ,  Star  etc.) 
Access  Protocol  ( CSMA/CD,  Token  Ring,  etc.) 
Frame/ Packet  Size 
Transmission  Capacity 
Signal  propagation  delay 
Buffer  size 
Processing  delays 
Throughput 
User  traffic  profile 
Data  collision  and  retransmission 
Bridging/Routing  decisions 
•  Security 
'  Availability 


Video  Network  Design 

•  Design  a  network  to  support  video  conferencing  between  three 
remote  locations  (Santa  Fe,  San  Diego  and  Mexico).  Each  of  these 
locations  one  or  more  active  participants  introducing  compressed 
video  at  a  rate  of  384  kbps  into  the  network  (Using  h.261). 

•  There  may  be  more  than  one  passive  participant  (  Listening  only  - 
destination  to  the  data)  at  each  site. 

•  Employ  multicasting  to  reduce  the  duplicate  traffic  on  LAN  and 
WAN  links  where  possible. 

•  Simulate  the  proposed  model  and  provide  performance  measures 
on  the  LAN  and  WAN  Links. 

•  Based  on  the  simulations  refine  the  design  (Topology,  link  and 
nodes,  WAN  service  type  etc) . 


NOTES 


Monday, 

10  May  1999 


8:15am  -  8:45am 
Mon,  10  May  -  1.1 


10  Gigabit 
Ethernet 

Peter  Wang 

Technology  DtvtIopnMnt  Contor 
May  10.  ItM 


jm  Outline 

Drive  Towards  10  Gb  Ethernet 

Motivations 
Applications  of  10  GbE 

-  Requirements 

Technical  Issues  &  Technology  Enablers 

-  Transmission  medium 
~PMD 

-PMA 

-PCS 

-MAC 

Summary 


Drive  Towards  10  Gb 
Ethernet 


JH  Customer  problems  to  be  solved 
with  10  GbE 


•  Traditional  LAN  applications  and  private 
enterprise  applications 

-  GbE  to  10  GbE  aggregating  switches 

-  Unking  multiHpoft  10  GbE  Switches 

-  Unking  mutti-Gbps  Routers  inside  a  LAN 

•  LAN  and  Non-LAN  private  enterprise 
applications 

-  High  speed  clustered  computing  interconnects. 
»  NGIO  &  FuttfdO 

-  Provide  point-to-point  backplane  connection 

•  access.  MAN,  "RAN”.  WAN(7) 


1 


Cost  Ratios:  Sw  1000  to  Sw  100 


10  G£  cost  expectation:  <  lOx  GE 


Technical  Issues  & 
Technology  Enablers 


Protocol  stack 


Medium 


Standard  Single-mode  Fiber 

-  Can^s  backbone  and  metro  area  netwofk. 

-  Suitable  for  long  wavelength  lasers 

»  1.3  um:  fiber  attenuation  Imltad 
»  iSurrcdiapef^lmtted 

Multi-mode  Fiber 

-  Server-switch  connections;  horizontal  & 
vertic^  risers 

>  Large  core  size,  low  cost  transceivers 

>  iSI  from  intermodal  and  Intramodet  disperskm 
fimits  the  bit  rate  and  distance 

>  CWDM  on  existing  or  serial  on  adv.  MMF 

Copper 

-  Server-switch  in  data  center 

>  Extremely  Mmited  tfistance 

-  Complex  PMA  if  multilevel  signal 


Laser  Technologies 

Fabry-Perot 

-  1.3  um:  SMF/MMF 

-  Simple  structure  and  low  cost 

-  Short  reach,  multt-mode  laser 

»  distance  fimited  by  dispersion  and  mode- 
partition  noise 

Vertical  Cavity  Lasers  (VCSELs) 

-  0.85  iJiii/1.3  um;  SMF/  MMF 

~  Low  cost  and  easier  padcaging 

-  Parallel  optics  CWDM 


PMD  Components  -  Modulator 

•  Used  for  extended  reach 
Requirements 

-  Hi^  modulation  speed  &  tinearity 
*  Low  driving  voltage  &  cost 

-  Packagmg  size 

Technologies 

-  UNbOa 

-  Electro-Absorption 

-  Hybrid  integration  with  DFB  lasers 


PHY  Components  Status 


PCS  Sublayer 


Now  tOOiw  Wton 
CWUMJMiOSi  Now  lOGfap*  aOOIm 

UneoiMFP  IZSObp*  1  ton 

UvoeMOFB  JuvM  IZSGfaps  10ton 

PMxMKtor  PW  Nwr  IZSGfap*  N» 

APO  Naw  1ZSGN»  NM 

LMNrOrtMT  Gn%  Naw  lOGtopa  NJA 

StG«  OkM  USGbps  NM 

TIA  Nwr  IZSGfcpa  NIA 

aot  Oa»W  IZ^Gb^  NCA 

IMtooAmp  GaM  Naw  12.9<3ip>  N» 

aca  Oao«  12.SGbp>  NCA 

COR  Ga>m  Mw  125Ghc*  N«A 

SCa  Dac>«»  aSGtpa  WA 

MucOaRin  GaAaorS^  Naw  lOGbpa  N<A 

_ jsifu  DaoW  IZSqbpa  MIA 


Encoding/decoding  of  data  from/to  MAC 
Need  for  10  GMil 
»  Singte/Mutti-diarmel  operation 
Auto-fiegotiation 
-  1000/10000? 


MAC 


MAC  Challenges  and  Solutions 


Ethernet  without  CSMA/CD 

>•  ConnecUoniess  packet  transmissions 

-  Consistant  addressing  &  frame  format 
•>  No  translation,  nor  segmentation 

Functions 

-  FuD-duplex  only,  speed/CBstance  independent 

» li«er4nmttoap(IFG)aprMfntiie«iz«inbttiinM 

-  Flow  control 

»  PauM  opentfon  raconpM  time  In  bN  tkriM 

-  Auto  negotiations 


Issues  &  Challenges 

-  Bus  interfooe  (Backplane/MAC  and  MAC/PCS) 

»  PwM4atawidtt)(10n»2IV32^MMc.:ilngtevi.dHr^^ 

»  dod(fale(1.2SG«2SM£22M/3125Wetc.) 

»  chfp«Npslcew 

-  Prooessino  Ethernet  frarnes  with  very  short  lookup  time 

-  Buffering 

Technologies  &  Solutions 

Interfaces 

•  SSTUPECULVDS.  CML 
~  Advances  ki  CMOS 

»  0.2Smm  prooew  now.  mignbng  to  .ISmm 
»  embedded  DRMI  for  buffer 

»  Larger  min  frame  Size  and  ardtitectural  improvements 


jgj  standard  Scope  •  TBD 

Distance/Media 

->  802.3Z  link  model? 

-  Existing  cabHng  standard  adequate? 

»  NewadvanoKl  MMF? 

»  fo  capper  (Le.  CXace)  worth  GOfWidedno? 

Wavelengths 

-  850/1300/1550  nm? 

Channels 

~  Serial  only?  Or  indude  CWDM? 

Coding 

-  8B/10B,  14B/156, 16B/18B.  scrambling, 
mulblevel  analog? 

Quality 

-  Laser  safety? 

-  BER?  Jitter? 

-  EMI 


■I  Potential  Starting  Point  for  10GE 
Discussions 

•  MAC 

-  fuB-duplex  only 

-  min.  packet  size  (256  vs  64  bytes) 

•  MAC/PCS/PMA  standard  interfaces 

•  10  Gbps  (data) 

•  Encoding  (8B/1  OB) 

•  Single  wavelength  in  1300  nm  window 

•  Standard  single  mode  fiber 

•  Up  to  30  km 


More  connected."^ 


8:45am  -  9;  15am 
Mon,  1 0  May  -  1 .2 


PAROLI  a  Synchronous  Interconnection  Link 
with  a  Throuah  Put  of  13  Gbit/s 


Karsten  DrbgemOiler 
karston.ciro«gemu«nerO  Infirwoftoom 


PAROU 

Parallei  Optical  Link 


Competing  Data  Link  Technologies 

Regions  o(  CoRipetllve  AcMRitag* 

90 

5 

Ffe«  V 

_  / — = 

o 

\1  PSf^ 

£  ” 

-  OpuLMc 

1 

Coax 

JD 

3»  tjo 

Sadat 

1  M 
< 

_  OicttalWlraRlibon 

Opdoeu*  _ 

Oil 

_ ! _ 1 _ 

_ 1 _ 1 _ 

1  10  100  1000  10000  100000 

Intstoonnedion  CManoe  (m) 

PAROU 

Parallel  Optical  Link 

“ 

Why  Optical  Interconnect  ? 


Copper  BW  x  distance  product  limited 

BW  demand  drastically  increasing 

Cable  size  is  1/50th  of  copper 

Optics  is  the  solution  to  escalating  EMI  problems 

Cost  is  not  much  higher  than  high  performance  copper 


PAROU 

Paraltet  Optical  Link 


Tb/s  Chip  I/O  -  how  close  are  we  to 
practical  reality? 


9:15am  -  9:45am 
Mon,  10  May  -  1.3 


Rick  Walker 

Hewlett-Packard  Company 
Palo  Alto,  California 
walker@opus.hpl.hp.com 


Agenda 

•  Applications  and  Key  Specifications 

•  General  Architecture  for  inter-chip  communication 

•  Limitations 

•  Skin-Loss 

•  Delay  Matching  for  Multi-phase  sampling 

•  CMOS  Scaling 

•  Industry  Trends 

•  Conclusions 


Current  Practice 


Current  high-performance  systems  are  skew  limited 
using  parallel  data  clocked  at  250-500Mb/s. 

Using  clock  and  data  recovery  on  Gb/s  links  eliminates 
the  skew  problem  and  improves  system  BW  by  factor  of 
8-1 6X. 


•  What  are  the  limits  for  advanced  systems? 


CPU-CPU/Memory  Application 


I/O 


Router  Application 


Backplane 


Key  Specifications 

•  Speed:  As  high  as  possible  -  at  least  1Tb/s  I/O  per  chip 

•  Latency:  critical  -  less  than  1 0ns  plus  time  of  flight 

•  BW/link:  limited  to  4-5  Gb/s  by  PCB  loss 

•  Power:  for  a  1 0OW  chip,  all  250  links  should  dissipate 
less  than  40W  ->  1 60  mW  per  link 

•  Size:  a  typical  processor  may  be  9cm^,  if  links  use  20% 
of  the  total  area,  then  each  4Gb/s  link  cell  should  be 
less  than  720000um^  in  size. 


Genera  Arch  lecture 


Skin  Loss  and  Dielectric  Loss 


Nearly  all  cables  are  well  modeled  by  a  product  of  Skin  Loss 

5(/)  =  e  ,  and  Dielectric  Loss  z>(/)  =  e  with 

appropriate  kgjkd  factors.  Dielectric  Loss  dominates  in  the  multi- 
GHz  range.  Both  plot  as  straight  lines  on  log(dB)  vs  log(f)  graph. 


1.0 

o 

-o 

.2 

“q. 

E 

CO 

CO 

CD 

c 


0.0 


Ik  freq  (log  scale)  10G 
IYFW82] 


Three-element  equivalent 
circuit  of  a  conductor  with 
skin  loss 


Non-equalized  NRZ  data 


0.0 


10.0 

time  [ns] 


20,0 


Skin  Loss  Equalization  at  Transmitter 

^^~~lJTr  L-  pulse  after  every  transition 


usabie  signai 


[FMW97] 


before 


6dB  Equalized  Data 


nnniiTiliTilililililiirniKiliHi  ul 

L>Ln!L>L'L'l.m'l!il! 

Illlllllllllllll 

llllllllflllllll 

nriiinririiinririiiiii 


■I 

mil 
mil 

■111 . . . 

iiiiiiiiiiiiuimtiiiiiiiimmii 


L>l5IL>liail 

l■lll■IU 
l■lll■ll 

_ _ _  riiBiniBii 

iiiiiiiiiiitiiiiiiiiiiiHiiM  lll■ii^ll■ll 


■lUI 

III! 

nil 

"ini 

mi 


l!UEI!!l!L> 

1  mill 
i  mm 

1  niiniiin 

■  luiniiir 


— 

-  *1  meter  ^R4\  4mil  trace  — 

wii  vwwii 

TIIITiTITIimilllllVIilMll'l'B 

:i!l!IMrifllJIJll.«.llM'flB 

B 

SB 

BB 

time  [ns] 


Skin  Loss 


peak  attenuation  of  8Gb/s  “1 01 0”  pattern  — 

4Gb/s  “1010”  pattern  ^ 

2Gb/s  “1010”  pattern 


A(/)  =  e 

3M  10M  SOM  100M300M  1G  3G  10G 
Frequency 


Data  rate  vs  distance  and  tan(5) 


distance  [meters] 


Communication  Trends 


20G 

10G 


^500M 

cc 

‘552OOM 

^100M 


SOM 


20M 


1987  1989  1991  1993  1995  1997  1999 


20M 

10M 

5M 

2M 

1M 

500K 

200K 

100K 

50K 

20K 


Year  of  Publication  (ISSCC) 


Example  Multiphase  RX  Block  Diagram 


[WHK98] 


Internet  Host  Count 


Measurement  of  a  Multi-phase  System 


Reported  Jitter:  8ps  rms,  44ps  pk-pk  at  3.5Gb/s. 

Measurement  of  photo  shows  26ps  difference  between 
widest  and  narrowest  eye,  so  true  eye  margin  for  end- 
end  system  \s  44psj2 +  2  •26ps  =  118/75,  or  a  total  eye 
closure  of  41%. 

Attention  to  delay  matching  is  critical! 


Techniques  to  Improve  Delay  Matching 
and  Power  Supply  Noise  Immunity 


•  Rising  edge  delay 
dependent  on 
absolute\-x 


•  Rising  edge  delay 
dependent  on 
absolute  My 


•  Delay  depends  on 
VT  matching 


•  RC  time-constant 
changes  with 
supply  voltage 


•  Current  source 
absorbs  supply 
voltage  changes 


CMOS  Scaling  Issues 

•  Gate  delay  no  longer  scales  with  process 


See:  Chenming  Hu,  “Low-Voltage  CMOS  Device  Scaling"  1994  ISSCC  Digest,  pp  86-87. 


CMOS  Scaling  Issues  (continued) 

•  doesn’t  track  with  power  supply  -  so  we  gradually  lose 
ability  to  make  ECL-iike  differential  circuits. 

•  Full-swing  circuits  show  worse  deiay  matching  than  ECL-like 
topologies. 

•  Full-swing  circuits  show  worse  power-supply  delay 

modulation  than  differential  circuits. 

•  Vt  matching  gets  worse  due  to  statistical  dopant  variations  in 

channel. 

•  All  of  these  trends  make  power  supply  noise 
rejection  and  multi-phase  alignment  more  difficult 
with  each  process  scaling. 


Power  and  die  size  vs  target 


O 

I 


50 

20 

10 

5 

2 

1 

0.5 

0.2 

0.1 


ITb/s  in  40W 


^1991 


i  i  I 

i  1997 
&1997 

Q<996  i 

:;5- 

*  I  bo 
_ :.>.o 

t991  j 

1994  O: 

rfcJ . 

)8  i 

:«  3 

:  1  M 

. T995'  * 

:  1 
:  1 

. i-Ql999-:-*Q-’^^2— .. 

O  §i  Bipolar 
n..bMQSl . 

:  1 
:  1 
j  1 

■"O^' . : . ! . 

1999  □ 

iQl999 


10m  20m  50m  0.1  0.2  0.5  1  2 

Link  Speed  per  unit  Die  Size  [Gb/sq-mm] 


10 


Industry  Trends 

•  50%  Of  aii  U.S.  Families  now  have  home  computers 

•  Computer  performance  has  surpassed  needs  of  most  users: 

witness  the  drop  of  RC.  prices  in  the  iast  3  years  from  a 
stable  $2K  down  to  $500  levels. 

•  Internet  host  count  was  doubling  every  6  months  in  1988,  is 

now  doubling  every  24  months  -  we  are  cleariy  past  the 
50%  adoption  point  in  the  growth  curve. 

•  What  applications  wili  continue  to  drive  expensive  and  exotic 

improvements  in  interconnect  technoiogy? 

•  Without  a  new  “killer  app”  to  drive  development,  we 
may  by  stuck  with  the  limitations  of  FR4/CMOS  for 
quite  some  time. 


Viability  of  “exotic”  technologies 

•  Yielded  CMOS  parts  come  in  at  $1 0/cm^ 

•  Tb/s  chip-chip  links  are  probably  feasible  in  the  next  few 

years. 

•  This  performance  can  be  achieved  with  existing  BGA 

packages  across  commodity  FR-4  PC  Backplanes. 

•  The  Incremental  cost  of  a  Tb/s  link  in  these  applications  will 

be  about  $18  +  connector  cost. 

•  For  optica!  solutions  to  take  hold  in  these 

applications,  they  must  provide  either  significantly 
higher  performance  (>10Tb/s)  or  cheaper  system 
cost  (not  likely). 


Conclusions 


•  still  much  work  to  be  done,  but  1  Tb/s  chip  I/O  seems  an 

attainable  target. 

•  5Gb/s  on  1  meter  PCB  is  the  fastest  that  can  be  feasibly 

supported  for  the  foreseeable  future  with  low  latency. 

•  Fiber  seems  to  be  progressing  along  either  a  1-10-100- 

1000-1 0,000MHz  or  a  622-2488-1 0,000MHz 
evolutionary  path.  There  may  be  an  economically 
important  need  for  5Gb/s  links. 

•  10  Tb/s  chip  I/O  is  probably  out  of  the  question  for 
current  high-volume  technologies  (CMOS,  FR-4  PCB). 
Computer  designs  and  programs  may  have  to  give  up 
cache  coherency,  and  move  towards  cooperative 
computing  architectures  to  break  out  of  this  limitation. 


1 0:00am  -  1 0:30am 
Mon,  10  May  -  1.4 


•  Introduction  to  Digital  Cross-connects 


•  Applications,  Requirements,  etc. 

•  Interconnect  Requirements 

•  Performance,  Physical,  Reliability,  Cost 

•  Interconnect  Choices 

•  Conclusions 


4/16/99 


HSr99  xph-Tellabs 


2 


Wideband  Cross-Connect 
r 


APPUCATIONS  OC-48  Rings 

•  Groom  and  RH 

•  Restoration 

•  SONET  Gateway 
•Switch  Cut 

•PM  &  Test 
Access 

•Facilities  detection 

•  WDCS  is  a  5-stage  switch: 
Time-space-space-space-time 

Time  slot  interchange  done  at 
the  ports 


BIT  RATE 


FEATURES 

•  DCS  receives  electrical  or 
optical  signals 

•  Payload  mapping 

•DS1  toVTI.S 
•DSStoSTSI  SPE 

•  Payload  Transformation 
Vri.5-DS1-DS3-STS1  SPE 
•Test  Access 

-  ScalaUe  from  32  to  >  2000 
DS3  ports 


4/16/99 


HSr99  iph-Tcllabs 


5 


Broadband  Cross-Connect 


•  BDCS  granularity  is  DS  3  and  ports  are  from  OC-12  to  OC-48 


4/16/99 


HSI’99  rph-Tellabs 


6 


4/16/99 


HSr99  rph-Tellabs 


8 


Requirements 

W 

• 

Performance  ah  interconnects  • 

Space 

•  Interconnect  rate  up  to  4  Gbps 

•  Core  Modules 

•  BER  <10*15 

•  Connected  to  switch  end  stage 

•  Jitter  -  <  0.1  dB  peak  (complicated 

modules 

issue) 

-  200  pins  of  connector 

• 

Port-to  Core  Interconnect 

(-8”  of  connector) 

•  End  Stage  Modules 

•  Physical  Distance  between  Ports 

•  Core  Connection 

and  Core  up  to  300  meters 

•  Core  module  connected  to  four 

-  2  X  200  pin  connector 

ports 

•  4  Port  connections  (duplex)  at 

4  Gbps  each  (optical) 

• 

Within  Core 

•  Port  Modules 

•  Distances  <10  meters 

•  Two  4  Gbps  Connections 

•  Module  I/O  up  to  128  Gbps  (one 

(duplex)  to  Core 

card  12x  16) 

4/16/99 

HSI’99  rph-TeUabs 

9 

Requirements,  cont. 


Reliability 

•  Definition  of  Failure 

•  BER  spec  is  <  f  0-"  IfBER  >  10^^,  has 
link  failed? 

•  End-toEnd  connectivity:  From  Bellcore 
GR-499-CORE  -  ‘A  period  of 

unavailable  time  begins  when  the  bit-error  rate  in 
each  second  is  worse  than  10^  for  a  period  of  10 
consecutive  seconds.  These  ten  seconds  are 
considered  to  be  unavailable  time''. 

•  DCS  port-to-port  DT  <  0.1 
min/connection/year 

•  FIT  <  380  for  the  connection 
0  1  +  1  connection  redundancy 

•  Module  Failure  Rate  <  0.25/yr 

•  Transceiver  MTBF  >16  years  (FIT  <  7000) 


Cost 

•  Number  of  interconnects 

•  DCS  size:  1 28  to  1 056  ports 

•  There  are  -  1600  interconnects  in  a 
128x128  DCS 

•  Cost 

•  Systems  interconnect  cost  could 
exceed  $  1M 

•  Allowable  cost  depends  upon 
functionality 

•  FEC  and  CDR  included? 

•  Top-Down  View 

(System  per-port  price)  =bOM 
margin  factor 

BOM  -  Score&mech&controt  = 

$  interconnect 


4/16/99 


HSI’99  iph-Tellabs 


10 


^Optical  Interconnect  Options 

•r 


•  Small  form  factor  single  channel 
transceiver  modules 

•  TO  Can  based  modules  -  semi- 
cost-efFective  85/Gbps),  reliable 

•  Bit  rate  - 1 .25  Gbps  (2.5  Gbps 
coming) 

>  Conciusion:  Stiil  too  much  board 
space  required 

•  Other  high-speed  Serial  link 
Options 

•  5  Gbps  serial  link 

•  Not  there  yet  (10  Gbps  ethemet) 

•  Parallel  Optical  Interconnects 

•  Advantages 

•  Size,  performance 

•  Disadvantages 

•  Availability  -  single  source 

•  Immature  Product 

•  Need  for  deskewing 

•  Costs  >  $  200/Gbps 


#  Muxed  (WDM)  Interconnect 

•  Advantages 

•  MulthGbps  throo^  single  connector 

•  Disadvantages 

•  Cost  and  availability 

•  Etc. 


•  Issues 

•  VCSEL  vs  edge-emitter 

•  Short  vs  long  wavelength 

•  Single  vs  multimode  fiber 

•  Discrete  vs  integrated  transmit  and 
receive  modules 

•  Duplex  links  -  integrated 

•  Performance  -  discrete 


4/16/99 


HSr99  rph-Tellabs 


11 


Structure  of  the  POI 


Simplex  POI 


DCS  Optical  Interconnect 
Specifications 


OE  transmitter  Blind  Mite  Cable-end  Connector  3R  Receiver 


Port-to-Core  distances  dictate  the 
use  of  Optics 
•  Physical 

•  Slze:LxWxH«2‘xrx0.5’ 

•  0  -  85  ®C  (ambierrt)  Operating 
Temperatiffe  Range 


•  POI  Consists  of  FIVE  Elements: 

•  OE  Transmitter  Module 

•  OE  Receiver  Module 

•  Connector 

•  Blind  Mate  /  Backplane  Shroud 

•  Connectorized  Cable 


4  Gbps  internal  transport  unit 
interconnect  Options 
(^channels  x  bit  rate):  • 

1x4  Gbps  SSS 

2x2  Gbps  maybe 

4  X  1  Gb^  just  right 

5  X  .S,  S  X  .5, 10  X  .4  less  ojximum 


Suggested  Format  8X  connector 
(MT) 

8X  ribbon  cable,  Blind  Mate 

Suggested  Bit  Rate  - 1 
Gbps/fiber 

•  4x1  Gbps  format 

•  8X  trwismitter  module  with  2 
channels  OR 


•  4X  transmit  4X  receive 
(transceiver  module) 


4/16/99 


HSr99  iph-TeUabs 


12 


Electrical  Interconnects 

<1  '  '  . . .  ~ 

Costs  dictate  the  use  of  electrical  interconnects  (over  optics) 

Approximately  $  35/Gbps  (dominated  by  ICs)  =  1/lOtfa  cost  of  Optics 


•  Cables 

•  Twin-ax  -  diff.  pair  over  1 0  meters  at  > 
1Gbps 

•  Advantages  -  $,  availability,  ruggedness.etc, 

•  Disadvantages  -  cable  management 

•  Microstrip  Ribbon  Cable 

•  Advantages  -  management, 

•  Disadvantages  -  $,  connectors, 

•  Connectors 

•  2  mm  nxxJular  PGA  systems  6  -  8  rows 

•  ADV:  mult  Vendors,  $,  modularity,  reliable 


•  Card-edge  connectors 

•  DIS:  availability,  cost,  perform., 
assy,  procedure  modifications 

•  Boards  -  2,5  Gbps  signals  for  short 
distances  on  FR4. 

•  Gtech  or  cyanate  ester  better  board 
material 

•  Teflon  or  polymer  laminate  with 
pstrip  lines 

•  Line  Drivers  and  Recovery  Chips 

•  Required  at  Gbps  rates 


4/16/99 


HSr99  iph-Tellabs 


13 


^Summary  and  Conclusions 

. 

¥ 

•  “The  whole  system  is  an  interconnect!” 

•  Electrical  interconnects 

•  Speed  and  density  demands  are  there:  10  Gbps  ports  in  Terabit  systems 

•  System  design  must  be  cognizant  of  interconnects  from  the  beginning 

•  Board  t/O  and  cable  density  are  issues 

•  Uniformity  and  stability  of  media  is  increasingly  important 

•  Driver  and  recovery  chips  are  vital 

•  Optical 

•  Fast  Serial  and  parallel  are  needed 

•  Connector  systems  need  improvements 

•  Cost  HAS  to  come  dowrn 

•  Opportunity  for  more  functionality  in  optics  -  lower  system  cost 


4/16/99 


HSr99  rph-Tellabs 


14 


Moore  *s  Law:  The  Intra-System 
I/O  Challenge 


10:30am  -  1 1:00am 
Mon,  10  May  - 1.5 


Craig  Theorin 
May  10,  1999 


©  W,L.  Gore  &  Associates  Jnc.,  1999 


Overview 

•  Introduction:  The  implications  of  Moore 
+  Amdahl. 

•  Link  Architecture  Options 

•  Copper  Media  Scalability 

•  Fiber  Optics  Scalability 

•  Conclusion 


©  W.L.  Gore  &  Associates  Inc.,  1999 


Moore  +  Amdahl  =  Bandwidth  Growth 


1975  1900  IS 

lOM  „  ! 

OS  19 

90  199S  ^ 

'  EOO 

1M 

;  2000 
k*  Pereitim-  2S 

_ 1 _ ^ _  1-0 

lOK 

i  0.1 

1  j 

[  0.01 

Source:  Intel 


•  Moore  Observes  Exponential  MlPs  Growth, 

•  Amdahl  Necessitates  Proportional  BW  Growth  to 
leverage  Moore. 


©  W.L.  Gore  &  Associates  Inc.,  1999 


AmdahVs  Law:  Processor ,  I/O,  Memory  Balance 


Processor  I/O  Bandwidth  Memory 

(MIPS)  (Mbit/sec)  (Mbytes) 

System  performance  is  optimized  when  MIPs=Mbit/sec=IVIbytes 
If  processors  scale  with  Moore’s  Law,  so  must  I/O  and  Memory 


e>  W.L.  Gore  &  Associates  Inc.,  1999 


Intra-System  Data  Bandwidth  Trends 


Year 


Keeping  Pace  with  Moore 


Bandwidtfi; 

SiTWidth  7 

RiS^ Time ; 

Spectrum  :Zo  discont  Ch-Ch  Sk 

Year 

! 

“(psec)  i 

(psec)  ; 

A.U.  (psec) 

^  -fsgr 

rX)' 

fOOO- 

25tr 

1.4; 

““TOO 

■250' 

^ - 21500 

gSiJ- 

157 

*"272^ 

0763 

T5T 

- 2aDr 

- -2:5’ 

. 'SS7' 

-g-g- 

375” 

0740' 

"997 

: - 2t5D2 

4.0 

2501 

63; 

5:6'; 

0725. 

63: 

miHvgiijKi 

6.3 

39 

g:g-r- 

0.16 

- 2DW: 

- TO;r 

-99" 

25' 

14TTr 

0.10 

^25” 

: - 

T6.0. 

- 

■■  f6; 

_ 

0706 

"IB" 

- 200r 

- 25:^ 

39^ 

3676^ 

0.04 

“TO: 

— 2Cr07 

4^5.3“ 

25: 

6 

66*:47" 

0702 

- ^2trOB 

- '64:cr 

1-6 

'  4" 

8976'^ 

"D.'02 

2t5(J9' 

- ^0176" 

It)" 

2 

142.2t~ 

0701 

2 

®  W.L.  Gore  &  Associates  Inc.,  1999 


Serial  vs.  Parallel  Data  Streams 


•  Serialization  converts 
media  cost  to  launch 
cost 

•  long  reach  applications. 


•  Parallel  =  N*Serial. 

•  Maximum  I/O  BW  (ie.  10- 
100  X  serial)  &  I/O 
BW^Density. 

•  BW  Reduction  for  Serial 
Stream  Processing. 


Parallel  will  be  continually  obsoleted  by  increasing  SerDes 
performance/cost  ratio. 

A  decline  in  serialization  performance/cost  growth  may  require 

parallel. 

■  -  . . . .  ■ 

_ I  ©  WX.  Gore  &  Associates  Inc.,  1999 


SerDes  Cost/Performance 


Ganged  Serial  vs.  Clock  Forwarding 

•  Jitter  budget  has  limited  1  Gbps  opticaJ  clock 
forwarding. 

•  Clock  and  Data  Jitter  accumulate  in  budget 

•  HiPPI  Budget  TBD. 

•  FO  centric  designs  will  end  clock  forwarding. 

#  Future  High  BW  Links  will  look  like  hybrid  of 
parallel  and  serial  or  ‘'ganged  serial”. 

•  Allow  deskew  of  parallel  data  streams. 

•  Scalable  for  future  systems. 


Data  Coding 

m  Typical  Code  Functions: 

•  Limit  low  frequency  content  (ie.  run  length)  for  AC  coupling. 

•  ‘"DC-Balance”  the  signal  to  keep  duty  cycle  close  to  50%. 

•  Scrambling:  Muxing  a  PRBS  w/  Data 

•  Statistical  max  run  length  and  DC  balance. 

0  Multi-Level  Coding 

•  Lowers  max  frequency  by  increasing  #  of  bits/symbol. 

0  Forward  Error  Correction  (FEC) 

•  Using  Error  Correction  Bits  to  improve  BER 


©  W.L.  Gore  &  Associates  Inc.,  1999 


Copper  Scalability 


"STConcern 

:  Cause 

Solutions 

Loss,  £ye  PattemiSkin  bttect  a  Loss  lan 

•Larger "Cable,  bQ,  EDPT^Peaking : 

nsMi 

i Poor  Shielding  vs  Ir,  Imbalance: More  snielding,  RF  ChoKes, 

1  Return  Loss 

iZO  Discontinuity 

Control,  Signal  Shielding 

fNext/Fext 

;  Poor  Shielding  vs.  Tr 

;More  Shielding  between  Signals  , 

n^kew  (pair^fpiif) 

iSIgnafRouting,  brvanation 

Control,  DesKew  Circuits. 

!  Imbalance 

;bMl,  Jitter, 

Control  Mors  SFnel(3ihg?  i 

0  A  common  solution  is  to  increase  Tr 
0  “Control”,  Larger  Cable,  More  Shielding  =  Cost 


®  W.L.  Gore  &  Associates  Inc..  1999 


Bandwidth  Scalability 


850  Mbps 
20  Meters 
100  Ohm 
26AWG 


r 


©  W.L.  Gore  &  Associates  Inc.,  1999 


Gb  Ethernet  Copper  Modems 


•  1  Gbps  data  transmitted  over  4  pairs 

•  5  Level  Code  (PAM5)  used  to  send  2  bits/symbol 

•  Extra  bits  for  forward  error  correction  (EEC) 

•  Hybrids  are  used  to  bhdirectionaiiy  coupie  data  into 
cabie. 

•  Waveform  “shaping*’  for  EMi  compliance 

•  Noise  Reduction  Through  DSP 

•  NEXT  and  FEXT 

•  Digital  Echo  (reflection) 


©  W.L.  Gore  &  Associates  Inc..  1999 


Optics  Scalability 

SI  Concerns  =  C^use  ;  Solutions 

^^ticaT^fe  ^OpticaiT^lse  from  Source  Inproved  VCSH_s  and  Launch 
‘B^*~vs  .*BW  ;  Hgher’BW  Chalienges*  Bud^F  Longer  Wavelen^hs 

Jitter  Bu^dget  ^ Source  Jitter  &  Rx  BW  &  fsioise  Improved  VCSB-S  and  F?eceiver 
facFlFDaTa  JitteF | STim'ofFlo^+data’Jitter  iControf,  G^nged'Serial3^rrtssion  ; 


•  10  GbE  is  currently  addressing  this  issue  for  Serial  Optics  to  10 
Gbps, 

•  For  greater  B  W  Parallel  FO  is  required, 
m  At  a  cost 


<S>  W.L.  Gore  &  Associates  Inc.,  1999 


VCSEL  Scaling 


12.5  Gbps  (PRBS  2^7-1)  Optica!  Eye 

m  VCSEL  BW  >  20  GHz. 


5.0  Gbps  (PRBS  2^7-1)  Optical  Eye 


Parallel  Optic  Packaging 


Parallel  Optics  Eye  Safety 


Tx  optical  power  limited  by  FDA  &  lEC. 

Standard  250  um  pitch  12  wide  arrays  force  potential  6  dB 
power  reduction  for  safety  compliance. 

orN.A. 


Num«rtcal  Ap*ratuf« 


&  W.L.  Gore  &  Associates  Inc.,  1999 


Increasing  Beam  Divergence 


Far  field  distribution  of  a  0®  Far  field  distribution  of  a 

VCSEL  launch.  high  angle  VCSEL  launch. 


®  W.L.  Gore  &  Associates  Inc.,  1999 


Summary 

•  Moore  &  Amdahl  Require  10  Gbps  in  5  years  and 
100  Gbps  in  10  years, 

•  Leverage  iC  Functionality  to  solve  analog 
transmission  problems, 

•  DSP? 

•  Encoding,  Error  Recovery,  Peaking,  Deskew,  etc. 

•  Copper  EM!  challenges  will  be  extreme. 

•  Parallel  Optic  data  links  will  address  BW  needs. 

•  Ganged  Serial  Likely  for  future  Intra-System  I/O 


<S>  W.L.  Gore  &  Associates  Inc.,  1999 


NOTES 


Tuesday, 
11  May  1999 


8:15am  -  8:45am 
Tues,  n  May  -  2.1 


- ^ 

Outline 

•  Overview  of  scalable  multiprocessor  system  architecture  ar^d  Issues 

•  Cache  coherence  In  action 

•  Origin  2000  network  details 

•  Multiprocessor  interconnects  In  the  future 

v . . . 

sgi  sgi 


r  \ 

Interconnects  in  Scalable, 

Distributed  Multiprocessor  Systems 

Jeffrey  Kuskin 
Silicon  Graphics,  Inc. 

s _ ) 


Sgi  sgi 


- ^ 

Communication  Mechanisms 

•  Defines  the  convention  used  to  communicate  among  nodes 

•  Message  passing 

•  Each  node  has  direct  access  only  to  its  local  menx>ry 

•  Communication  between  nodes  is  requested  explicitly 

•  Examples:  Intel  Paragon,  Thinking  Machines  CM-5,  IBM  SP2 

•  Shared  memory 

•  Physically  separate  memories  appear  as  a  single,  unified  memory 

•  Each  node  may  access  any  memory  location  using  normal  loads/stores 

•  Examples:  HP/Convex  Exemplar,  SGI  Origin  2000,  Stanford  DASH 


V _ J 

sgi 


f  Cache  Coherence  Problem  1  I 


Sgi 


Memory 


Distributed  Cache  Coherence  1 

1  :Read  request  (16B) 

r  ^ 

Requesting 

Node 

2:Read  reply  (144B) 

T 

Home  Node 

Distributed  Cache  Coherence  2 


1:Read  request  (16B) 


3a:Read  reply  (144B) 


2:Data  forward  request  (16B) 


Cache  Coherence  Implications 

Protocol  messages  are  small 
Network /afency  is  crucial 

•  CPUs  do  not  tolerate  large  niemory  read  latencies  (getting  better) 

•  impact  tends  to  be  non-linear 


Network  LaterKy 

Network  bandwdth  also  important,  but  less  so 

•  Needed  at  basic  level  to  match  memory  and  CPU  bus  bandwidth 

•  Needed  to  cope  with  communication  bursts  (typical,  especially  in 
scientific  codes) 


Cache  Coherence  in  Action:  SGI  Origin  2000 

Processing  nodes  contain  2  R10K  CPUs,  memory,  and  network  interface 
Network  Imi^emented  via  6-port  routers  (details  to  follow) 

Hypercube  system  topology;  often  multiple  routers  per  network  traversal 


16  CPUs 


32  CPUs 


f  > 

Origin  2000  Network  Detail  1 

Origin  2000  Network  Detail  2 

•  Router  characteristics 

•  Implementation 

•  6  ports  connected  via  a  crossbar 

•  850K-gateASIC 

•  Each  port  bidirectional  at  800  MB/s  in  each  direction 

•  IBM  CMOS  5L  (0.5p  drawn),  5  metal  layers 

•  Provides  4  virtual  channels 

•  1 60  mm^  die  area 

•  Input-buffered  with  pipelined  crossbar  arbitration 

•  Core  operates  at  3.3V,  100  MHz 

•  Best-case  (fall-through)  input-to-output  latency  of  50  ns 

•  29  W  worst-case  power  dissipation 

•  Links  (per  direction) 

•  20  data  bits,  2  clock  bits  (differentiaO,  1  data  framing  bit 

•  Cables 

•  Clock  rate  of  200  MHz,  sampled  on  both  edges  (400  MHz  data  rate) 

•  Shielded,  electrically-matched  wires,  1-5  m 

•  Credit-based  flow  control 

•  Expensive :-( 

•  Sliding-window,  CRC-based  error  detection/retransmission 

c  _  _ _ J 

L .  . 

sgi 

sgi 

r 

r 

Origin  2000  Latency/Bandwidth  Characteristics 

Future  Trends 

900n 

•  Communication  ever-more  critical  to  overall  system  performance 

800- 

^  Unloaded 

•  Bandwidth  demands  growing 

700- 

Latency  (ns) 

•  CPU  bandwidth  growing,  both  of  system  bus  and  functional  units 

600- 

•  Memory  system  bandwidth  growing:  SDR,  DDR,  DRDRAM 

500- 

-  Pipelined 

400- 

_ ^  Bandwidth  (MB/s) 

•  Network  latency  becoming  more  of  a  problem 

300- 

r 

•  Decreasing  in  absolute  time 

200- 

•  But  increasing  when  measured  in  CPU  instruction  issue  slots 

100- 

1  1  1 

•  Latency  impact  on  overall  performance  is  non-linear 

0 

( 

1  1  1 
>12  3 

•  Will  interconnection  network  become  primary  limit  on  overall  system 

Number  of  Routers 

performance? 

V 

_ J 

L  _ _ _ _  .  .  J 

sgi 

IS 

sgi 

- 

Trend:  Merging  of  Network  Interface  and  CPU 

•  Desire  to  move  network  interface  “closer*  to  the  CPU 

•  Architecturally 

•  User-level,  protected  access  (“OS  bypass*) 

•  Tied  more  closely  to  memory  system  (address  translation,  etc.) 

•  Physically 

•  Place  on  same  die  as  CPU 

•  Direct  datapaths  between  CPU  internals  and  network  interface 

•  Challenges 

•  Development  of  reasonable  interface  to  user  jobs 

•  Electrical,  mechanical,  physical  integration  of  CPU  logic  and  network 
interface 

_ > 


Trend;  Active  Networks 

•  Current  multiprocessor  networks  are  “passive* 

•  Message  unchanged  as  it  flows  through  network 

•  Network  does  not  interpret  message  contents 

•  Result  network  acts  mainly  as  a  delay  element  (though  a  useful  one!) 

•  Idea:  perform  computation  in  the  network  as  well  as  on  CPUs 

•  Benefits 

•  Moves  computation  closer  to  the  data  on  which  it  operates 

•  Offloads  CPUs 

•  Challenges 

•  Programming  model,  compiler  and  OS  support,  protection,  etc. 

•  Details  of  computational  resources,  integration  into  network  fabric,  etc. 

Vs _ _ _ J 


Conclusions 


r 


•  Interconnect  is  a  key  component  of  multiprocessor  system  performance 

•  interconnect  latency  and  banchmcfth  are  both  important 

•  Low  latency  especially  ciitical  for  cache  coherence 

•  Bandwidth  for  message  passing,  clustering,  traffic  bursts 

•  Future  interconnects  must  continue  to  improve  latency  and  bandwidth 

•  By  coupling  the  network  more  closely  to  the  CPU 

•  By  (eventually)  making  the  networks  “active* 


V. 


J 


8:45am  -  9:15am 
Tues,  1 1  May  -  2.2 


The  Role  of  Optics 
in 

Balanced  Computer  System  Design 

Mike  Chastain 
Hewlett-Packard 
chastam@rsnJip.coin 


Mike  Chastain 


Workshop  on  fntarconneetions  Within  Hi^  Speed  Digital  Systems 


April  29,  1999 


Parallel  Fiber  Optic  Development  ^ 


The  advantages  of  parallel  Gber  versus  copper  interconnect  are  well  known 
•Physical  size  reduction  at  high  frequency 
•  Connectors  and  cables 

•  Greater  communication  distance 

•  Reduced  susceptibility  to  EMI  and  EMC 

Computer  industry  has  watched  parallel  fiber  development  for  five  years 

•  Costs  have  always  prevented  wide  spread  system  insertion 
Computer  industry  is  also  slow  to  adopt  new  interconnect  technologies 

•  Waits  for  technology  cost  crossover;  or  some  “extemaT’  forcing  function 

Industry  investment  is  making  parallel  fiber  more  viable 

•  Real  products  now  appearing  from  multiple  vendors 

•  Costs  are  starting  to  come  down 
•Breakthroughs  in  manufacturing  and  packaging 
•  Optimistic  projections  of  high  volume  insertion 

•  Costs  are  still  high  relative  to  copper  for  short  (<IOm)  links 

Are  there  other  forces,  outside  optical  development,  that  may  hasten  insertion? 

Mike  Chastain  Worieshop  on  Interconnections  Vfithin  High  Speed  Digital  Systems  April 29,  1999 


Consider  CPU  Performance  ^ 


The  industry  is  now  increasing  CPU  performance  at  an  exponential  rate 
•Single  chip  CPUs  breaking  the  Gigahertz  barrier,  and  beyond 
•Single  chip  CPUs  incorporating  “super-computer  architecture  tricks” 

Increasing  CPU  performance  driving  corresponding  increase  in  bandwidths 

•  Soon  CPUs  may  require  8  GB/sec,  or  more,  to  sustain  performan<» 

Increasing  bandwidths  forcing  maximum  frequency  at  all  CPU  and  ASIC  pins 
•Designers  struggling  to  maintain  reasonable  pin  counts  for  manufacturing 

•  Intel’s  endorsement  of  Rambus  is  an  indication  of  a  pervasive  problem 

1000-- 

100- 

I 

“  10- 
£ 

«c: 

I 

c 

«  0.1-1 - 1 - , - , - 1 - 1 - 1 - i - i - 1 - i 

19M  1996  1998  2000  2002  2004 

Workshop  on  Interconnections  WMhin  High  Speed  Digital  Systems 


Mike  Chastain 


Apr929.  1999 


Consider  Copper  Interconnect  Limits  ^ 


At  today’s  intercoimect  frequencies  (up  to  ~1  Ghz) 

•Primary  frequency  dependent  loss  mechanism  is  skin  effect 

•  Proportional  to 

At  interconnect  frequencies  beyond  1  Ghz 
•Dielectric  loss  starts  to  dominate 
•Impact  greater;  Dielectric  loss  increases  linearly  with  f 
At  interconnect  frequencies  approaching  2,5  Ghz 

•  Interconnect  distance  may  be  limited  to  a  single  backplane  or  PC  planer 
•New  low  loss  PCB  materials  will  be  required 

At  interconnect  frequencies  approaching  5.0  Ghz 
•PCB  interconnects  may  no  longer  practical 

Copper  cables  are  still  an  option;  for  now 

•  Designers  will  trade  copper  trace  for  cable  to  increase  interconnect  distance 

•  4-5”  of  PCB  trace  is  roughly  equivalent  in  loss  to  3  feet  of  copper  cable 

•  Parallel  copper  cables  will  still  be  limited  to  adjacent  racks 

•  Six  to  ten  meters  at  622  Mhz,  dropping  linearly  with  frequency 

'  Machine  room  level  interconnects  are  already  in  Jeopardy  without  parallel  fiber! 

Mike  Chastain  Wor1csh<^  on  Intenxtnnections  Within  High  Speed  Digital  Systems  April 29.  1999 


Consider  Server  Packaging  Density  1^ 


Increased  interconnect  frequencies  coupled  with  greater  mterconnect  losses 
•Driving  system  designers  to  reduce  interconnect  distances 
•Driving  system  designers  to  increase  system  packaging  density 

To  achieve  increased  packaging  density 

•  More  ASIC  int^ration  to  minimize  component  count 
•More  CPUs,  ASICs,  and  RAM  per  PCB  area 

(Frequency  *  Density)  is  driving  power  density  to  the  limits 

•  More  gates  at  greater  frequency  s=>  more  power  density! 

•  More  hi^  speed  I/O  ==>  more  power  density! 

•More  CPUs,  ASICs,  and  RAM  per  PCB  area  =>  more  power  density! 

System  power  density  will  soon  exceed  machine  room  limitations 

•By  2002-3,  (4  CPUs  +  ASICs  +  16  GB  DRAM  +  Power)  =>  -850  watts 

•  Existing  rooms  are  designed  for  40-70  W/sq.fr.  with  an  18”  raised  floor 
•Floor  area  +  service  area  =>  19”  racdc  occupies  -14  sq.ft.  =>  980  watts  max 

•New  standards  are  stiU  inadequate  (125  W/sq^t.  36”  raised  floor  suggested) 

Result:  Packaging  density  limited  by  machine  room  for  foreseeable  future 

Mike  Chastain  Workshop  on  Interzxmnections  Within  High  Speed  OigiUd  Systems  April 29,  1999 


Future  Server  Designs  ? 


May  therefore  consist  of  medium  (CPU  count)  SMPs 

•  High  frequency  signaling  on  all  interconnects 

•  High  integration  (and  high  power)  silicon 

•  High  density  packaging,  power  input,  and  power  dissipation 

Tightly  coupled  electrically,  but  not  physically 
•Ti^tly  integrated  coherent  cable  interconnects 
•Utilize  copper  until  frequency-versus-distance  becomes  prohibitive 

•  Shift  to  optical  as  frequency  increases  and/or  costs  come  down 
•  Frequency  may  cause  a  sbifr  in  spite  of  costs! 

•  Shift  to  optical  now  for  machine  room  level  interconnects 

•  Such  as  the  emerging  Future  I/O  standard 

To  balance  system  performance  versus  machine  room  constraints 

•  Spread  system  across  multiple  racks  to  distribute  thermal  load 
•Match  machine  room  capabilities  (power/sq.ft.  electrical  and  thermal) 

•Perhaps  integrated  with  storage  and  I/O  components 

•  (?ood  volume  utilization  without  adding  s^nificant  power/sqft. 


Mike  Chastain 


Workshop  on  Interconnections  Within  High  Speed  Digital  Systems 


AprU29.  1999 


The  Role  of  Optics  in  Server  Evolution  SSSSS 


Optical  interconnects  within  servers 

*  Evolutionary  “copper  replacement”  strata  as  frequency  increases 

•  System  architects  must  work  closely  with  the  optical  link  community 

•  Copper  designs  must  be  compatible  with  optical  link  limitations 

•  Optical  components  must  be  compatible  with  server  manufacturing 

•  Optical  must  be  consistent  with  server  connector  requirements 

•  Evolutionary  “EMMEMC  management”  strategy  as  frequency  increases 

Optical  interconnects  between  servers  and/or  I/O  within  a  machine  room 
•Addressed  by  the  emerging  Future  I/O  standard 

•  Parallel  optical  links  at  1/2/4  GB/sec  data  delivery;  up  to  300m  (@1GB) 

•  Network  like  protocols  optimized  for  both  cluster  and  I/O  communication 

•  Designed  for  highly  reliable,  fault  tolerant  communication 

•  Designed  to  enable  sharing  of  storage,  networks,  and  other  I/O 

Optical  interconnects  between  machine  rooms 

•Still  addressed  by  existing  and  evolving  LAN/WAN  infrastructures 
•Tightly  bridged  to  the  emeig^g  Future  VO  standard 


MiXe  Chastain  Workshop  on  Interconnections  Within  High  Speed  Digital  Systems  April 29.  1999 


Server  Architects  must  Design  for  Optics  gu  hewilct 


Server  architects  are  starting  to  design  within  limits  of  available  optics 

•Accepting  12  bit  wide  link*;  as  a  cost  effective  limit 

•  Leveraging  telecom  volumes  for  connectors  and  cables 

•Accepting  per-bit  encoding  and  self-clocking  for  AC  coupled  linkg 

•  Designing  in  clock  recovery  and  link  training  sequences 

•Accepting  multiple  bit  time  skews  between  end  points 
•Performing  parallel  word  re-assembly  in  end  points 

•  Accepting  a  *‘non-zero  BER”  at  high  frequencies 
•Designing  in  transparent  link  retry  and  ECC  recovery  mechanisms 


Mike  Chastain 


Workshop  on  Interconnections  Wkhin  High  Speed  Digital  Systerr^i  April 29.  1999 


Optical  Vendors  must  Design  for  Servers  ^ 


Optical  link  frequency  has  been  driven  by  the  telecom  industry 
•Telecom  road  map  is  4x  per  generation;  622Mhz,  23G112;,  lOGhz 

•  Server  road  map  is  2x  per  generation;  l^Ghz,  23Ghz,  SGhz,  10Ghz(?!) 

Optical  link  packaging  is  not  consistent  with  server  environments 
•Server  power  is  generally  noisy;  Optical  links  want  clean  power 
•Servers  (most)  rely  on  forced  air  convection  for  thermal  management 

•  Optical  interfaces  cannot  assume  heat  conduction  to  PCB 

•  Server  manufacturing  relies  on  robotic  assembly  and  test 

•  Optical  interfaces  should  support  standard  pick&place  BGA  processes 
•Seiners  need  blind-mate  optic^  connectors;  with  EMI  containment! 

•2nd  level  assemblies  to  accomplish  blind-mate/containment  are  expensive 

Servers  need  “transparent”  optical  links 

•Server  silicon  must  be  re-used;  copper  links  may  become  optic  links 

•  Different  “products”  must  make  different  distance-cost  trade-off 
•Electrical  interfaces  consistent  with  (same  as)  copper  cable  interfaces 

•  Same  frequency,  encoded  self-clocdced,  low  voltage  differential  interfaces 
•Few  (if  any)  special  system  considerations  beyond  equivalent  <H>pper  cable 

•  Example:  special  system  requirements  to  handle  ^‘eye  safety** 

Mike  Chastain  workshop  on  lnterconr)ectlons  iMthin  High  Speed  Digital  Systems  April 29,  1999 


Summary 


Parallel  optical  links  are  finally  close  to  reality,  but  costs  are  still  hi^ 

Parallel  optical  links  (and  Future  I/O)  will  address  machine  room  interconnects 

CPU  frequency  and  associated  dielectric  losses  will  drive  server  density  upward 
But,  existing  machine  room  capability  will  limit  the  power  density  per  sq^t. 
Future  servers  may  trade  PCB  trace  for  cables  and  distribute  the  power  density 
But  dielectric  loss  also  reduces  copper  cable  length  proportional  to  frequency 
Therefore  server  designers  may  have  (non>cost)  reason  to  use  internal  optical  links 

Server  and  optical  link  designers  must  work  together  to  enable  a  smooth  transition 


Mike  Chastain 


Workshop  on  InterconnecShns  Within  High  Speed  Digital  Systems 


April 29,  1999 


9;  15am  -  9:45am 
Tues,  1 1  May  -  2.3 


In  Pursuit  of  a  Petaflop: 

Overcoming  the 
Bandwidth/Latency  Wall 
with  PIM  Technology 
In  the  HTMT  Project 

Dr.  Peter  M.  Kogge 

McCourtney  Prof.  Of  CS  &  Engineeriag 
IBM  Fellow,  IEEE  Fellow 
CSE  Dept,  Univ.  of  Notre  Dame 

lOdi  WotkAoy  oo  lateKowBcctiiHw,  Santa  Fe  SNTAFE99JTT  I 


9  Computer  Performance 

1  Teraflop  =  10^12  Flops/second 
-=  1,000  “peak”  1999  workstations 
Fastest  Computers  in  world  today: 
-ASCI  Red:  1  Teraflop  peak 
-ASCI  Blue:  3  Teraflops  peak 
1  PetafloD  =  1,000  Teraflops 

-  What  a  1  GF  machine  can  do  in  30  years  -  takes 
a  Petafiop  Machine  15  minutes 

12/99  lOrti  Woricdtop  M  IntercMMCcKofu,  Swita  F«  SNTAFE99JTT  3 


ip  Thesis 

Modern  technology:  The  Memory  Wall 

-  Latency:  cannot  access  data  fast  enough 

-  Bandwidth:  cannot  get  data  to  logic  fast  enough 
Next  level  of  supercomputing-  Petaflops: 

-  Impossible  without  radical  change 

A  direct  assault  on  the  problem  -  HTMT 

-  Hybrid  Technology,  MultiThreaded 

-  Mix  memory  &  logic,  interconnect  optically 

<11/99  IDth  Wsriuhop  on  Inlcrconncctioiu,  Santa  Fe  SNTAFE99.PPr  2 


What  are  Principal  Barriers 

tA  PAfaflAns*^ 


Physics  Driven: 

-  Memory  Density 

-  Memory  Bandwidth 
-Memory  Latency 

Semantics  Driven  (Programming  Model) 

-  Expressing  million  way  parallelism 
-Without  global  synchronization 


loth  Weritihop  en  Intereennectton*,  Santa  Fc  SNTAFE99.PFT  4 


1 


Latency  in  a  Single  System 


im  2001  2003 


g  CrUCVKilPrTM(a>) 

0  M*mat7S]nt(B  Anml 


THE  WALL 


Bandwidth 


Today’s  CPUs;  need  upwards  of  3  billion 
bytes  of  data  per  second 

-  And  almost  doubling  each  year! 

Today’s  cheap  memory:  at  most  1/30  of  this 

-  And  increase  at  only  perhaps  7%  per  year 


Problem!  •*  Need  either 

-  Lots  of  chips 

Lots  of  pins  on  each  chip 

-  Both 


lOlh  Woritsbop  on  Ittierconncctktns,  Sante  Fc  SNTAFE99.PFT  8 


Solution: 


Mixing  Significant  Logic  and  Memory  on  same  chip 
Huge  improvements  in  latency  (lOX)  &  bandwidth  (lOOX) 


viS 


3D  Binary  Hypercube 
Compute  Nodes  SIMD/MIMD  on  a  chip 
OF  THE  “CPU"  on  ONE  Chip 

=>  Opportunities  for  New  Scalable  Architectures 


PIM  Key:  Memory  Macro 


Multi  MB  independent  units  « 

-  Separate  address  decoding  I 

■*'  '  fBSBBBSBS 

-  Separate  refresh 

-  Separate  test  and  redundancy 

Sub  25  ns  access  to  1K-4K  full  rows 

Data 

Sub  10  ns  access  to  128-1024b  wide  words 
2.4  -  4+  GBps  bandwidth  potential  per 


loth  Workihop  oa  InteremtMtioiu,  Santa  Fe  SPTArX».nT  9 


10th  Wortuhop  on  IntcrconocctiiMU,  Santa  Fc  SNTAFE99.PPT  I 


Recent  PIM  Technology 


IBM  7LD,  10/97 

-  0.25u  DRAM>baied  process  with  S  LM 

-  2  MB/macro  in  less  than  30  mm> 

Silicon  Access,  3/98  (wwwjtiioaQacccm.e(i«) 

-  IP  ownership  of  hit^ly  variable  embeddable  DRAM  macro 

-  TSMC&UMCatUancesfor0.25u 

-  2  MB/macro  in  less  than  23  mm' 

Samsung,  7/9$  (min  naiWBUaay  rnw^rnitarti  'alrfraiVritilri  iriiln  h*tn) 

-  O^u  ASIC,  stacked  capacitor  configurable  DRAM,  S  LM 

-  Up  to  128  Mb/ chip  with  mix  of  SRAM,  Flash,  other  macros 
IBM  SA-27E,  3/99  (i»in..dJaiL»asx«aAi»«n999ta27c) 

-  0.18u  logic  process  with  embedded  trench  DRAM  array,  fi  LM  copper 

-  2MBis20mm*,  </J/isflcc«r,  ii/>jj;ates 

-  ASIC  design  flow,  rich  library  of  macros 

fl2/99  10th  Workdwp  on  Intercomteetiafu,  Santa  Fc  SNTAFEWJTT  II 


A  Real  DRAM  PIM 


KMmTo.riTilv 


loth  Woriuhop  on  Intercomcctioni,  Santa  Fe  SNTAFESS.PPT 


Chip  Count 


Design  Space: 

[emorv  “+”  Logic _ 


PIMVLSl  constraints  (as  of  todjiy): 

•  Approx,  f  yoyUg  btw—n  SIA  announce  &  volume  PIM  parts 

•  DRAM  logic:  1^2  speed  of  conventional,  but  approacdiing  same  leveb  of  metsll 


snv99  IMiWorfcrfiep  on  InleriwiMCtieni,  Santa  Fc  SNTAFE99 JTT  13 


Key  PIM  Subsystems 


•  5<acits  of  on-chip  memory 

•  Key  Subsystems  that  make  up  logic 

-  At  the  Sense  Amps  Accessing  Logic  (ASAP) 

-  “CPU-like”  local  GP  processing  (Core) 

•  Nodei  Memory  stack  +  ASAP  +  Core  +  I/F 

-  Potentially  many  nodes  per  PIM  chip 

•  Interfaces 

-  Intra  chip  links  between  nodes  (floorplanning) 

-  Inter  multiple  PIM  chip  systems 

-  Between  PIM  and  non-PIM  subsystems 

s/12/99  I8th  Wo rtaliop  OB  Interconnection*,  Santa  Fc  SNTAFE99.prr  14 


SHAMROCK:  Floorplanning  for 
Hush  Inter  Node  Bandwidth 


Current  ND  PIM  Projects 


loth  WorhdMp  on  InterconnectioB*,  Sonia  F<  .•n'rrAIT99: 


IDth  Woifuhop  on  InlCKonncctioaa,  Sonia  Fc  SNTAFE99.PPT  Ifi 


100,000,000  p 
10,000,000  - 


A  Peta  Machine: 


#  Chips.  Petabyti 

— iraffeaiop- 


o-  ■»-  .IM  CM  €><  W 

Year  of  Techr»logy  Availability 

It^s  The  Memory,  Stupid! 

IMh  Woriufaop  on  Inlerconncctioni,  Sonia  Fe  SNTAFE99JTT  17 


Ratio  of  Memory  Chips  to 


1  FI  of  Memory 

iza  T  B  of  Mtmol  f 

cts 

hi  oi 

e>  o 

o>  a> 

2001 

2003 

9noft. 

2009 

2012- 

Year  of  Technology  Availability 
Memory  Light  but  Decent  Ratio  &  Parts  Count 

Sn2J99  lOlh  Woriuhop  on  Interconncctionj,  Santo  Fe  SNTAFE99.Pmt 


The  Bandwidth  Problem  for 
Peta  Machines  _ 


5/12/99  lOHi  WoHcrfwp  on  Intereannectiwu,  Santa  Fe  SNTAFE99i^  19 


HTMT:  A  Petaflop  in 
2004-2006  Timeframe 


•  Multi-Institution  program  dating  back  to  1994 

•  Harness  emerging  technologies 

-  RSFQ  for  lOOGHz  “CPUs” 

-  PIM  for  smart  memory  hierarchy 

-  WDM  all  optical  network  for  interconnect 

-  Holographic  memory  for  dense  storage 

•  With  latency-tolerant  architectures 

-  MultiThreading 

-  Parcels  for  “in  the  memory”  function  execution 


5/12/99  loth  Worluhop  on  IntercMincctiotu,  Santa  Fc  SNTAFE99.PFT  20 


m 


The  HTMT  Facility 

(Courtesy  L  Bergman,  Cal  Tech) 


|!|  A  Replicable  HTMT  Unit 


5/12/99  10th  Weriuhop  an  {ntcramnections,  SanU  F«  S'NTAFE9».PPT  14 


9;45am  -  1 0: 1  Sam 
Tues,  1 1  May  -  2.4 


Ultra-High  Speed  Interconnections 
Network  for  Supercomputing 


Keren  Bergman 
Princeton  University 


Sponsors:  DOD,  JPiyNASA,  Caltech.  NSF,  ONR 


3 


technoloaMS  raseerch  orauD: 
MaiX  Arend 

Nathan  Kutz  (Univ.  W^sh.) 

architectura  and  svstam  dasian: 

Biandon  CoUings  (now  Lucent) 

Coke  Reed  (PU  ^bth/lDA) 

Jeff  Roth 

Charles  Fefferman  (PU  Math) 

Qimin  Yang 

Suzanne  Sears 

Ricicy  Leng 

Dmitriy  Krylov 

Keir  Neuman 

John  Hesse  (Inteiactic) 

Fi'incetou  University  ^ 

•  HTMT  Petaflops  project  overview 

•  Data  Vortex  optical  network 

•  Experimental  test  bed 

•  Modular  packaging  and  interfeces 

•  Summary  &  future  directions 


Princdr^n  University 


hdographic  storage 


{ViB<.x;ton  University  ^ 


CRAr.U  CRAM.  CRAM. . 
SMTERCONNECT 


Buftof .  Boltor 


SRAM  -i  SRAM  SRAM  . 


raaSBfSESBB 


•  16,000  PIM  1A3  ports 

•256Gbit/s«r  per  port 

•sustained  throughput 
bandwidth  -  petabyte 

•max  latency  <  X)0ns 

•variable  packet  size 


10:30am  -  1 1:00am 
Tues,  1 1  May  -  2.5 


Ultra^st  Optical  Interconnect 
Based  on  Routing  by  “Clockwork” 
in  Regular  Mesh  Networks 


D  Cotter<^>,  F  Chevalier<2)  and  D  Harie<2) 
<')BT  Laboratories.  UK 
wuniversity  of  Strathclyde.  UK 


Introduction 

•  Ultrafast  interconnection  network  for  multi-processor 
systems  (future  massive-capacity  routers  and 
servers) 

•  Multi-stage  packet-switched  network 

-  fixed-length  packets,  serial  bit  rates  0.1-1  Tbit/s 

-  routing  and  header  processing  'on  tfte  fly*  in  the  optical 
domain 

-  no  buffering  in  the  optical  domain 

-  corrtentiorvfree  in  the  optical  domain 

•  Routing  and  processing  mechanisms  as  simple  as 
possible 

-  regular  topology 

-  'clockwork'  routing 


Outline 

•  ‘Clockwork’  routing  in  the  Manhattan  street  network 

•  Node  architecture 

•  Performance 

•  Special  properties  significant  for  future  high- 
p^miance  multi-processor  systems 

-  uKra-low  laterx^  signing  (e.g.  acknowledgement) 

-  bandwidth  reservation 

-  process  scheduling 


3 


Suitable  topologies  [animated  slldel 

•  Eulerian  network  digraph 

-  decomposition  into  a  set  of  distinct  closed 
directed  trails 

•  Manhattan  Street  network 

-  every  node  is  topologically  equivalent 


Global  states  and  transitions 

fanimated  slide! 


Routing  by  ’clockwork’  [animated  slldel 


Link  length  =  (q/V+1)  time 
slots,  <7=0, 1,2,... 

-  a  packet  that  leaves  a  node 
in  time  slot/,  arrives  at  the 
next  node  in  time  slot 
(/f1)modA/ 

Packet  is  routed 
automatically  by  ‘clockwork’ 

-  no  routing  decisions  needed 
at  intermediate  nodes 


5 


Summary 

Strategy  for  optical  packet  routing  in  high-speed 
mesh  interconnect 

-  throughput  comparable  with  store-and-fbnArard 

-  trivial  processing  at  nodes 

-  no  hidden  queues  inside  the  network 

•  effective  access  control  for  bursty  (self-similar)  tralfic 

-  automatic  return-path  routing 

•  uRra-iow  latency  signalling 

•  process  schedulirig 


15 


LARGE-SCALE  PHOTONIC  PACKET  SWITCH  WDM  star-based  switch  architecture 

USING  WAVELENGTH  ROUTING  TECHNIQUES 


Two  alternative  configurations  of  WX/PWD  Configuration  of  time-slot  selector  (TSS) 


Composite  optical/electrical  buffer  configuration  Packet  loss  in  composite  output  buffer 


Packet  reception  for  phase  fluctuation  320-Gbit/s  system  configuration 


E 

e 


CO  ^  O  "O 


Role  of  electronics  in  photonic  packet  switch  Block  diagram  of  control-bit  extractor 


O 


Experiments  using  broadcast-and-select  switch  Experiments  using  wavelength-routing  switch 


cn 

CD 


<D 

o  o 


CO 


^21 

o  CO  c 

Mb 

■55 

S5  TJ  R 

c  y 

o  2 

c  a'S  o 
082^ 
of  J58 
o-i  °  « 

/l^  ®  c  ^ 
^  CO  .25 

S5  05  -2® 


'>  o 
>  CD 

^  jO 


“O 

CD 

“O 

CD. 

CD 

C  I 


^  E  S  C 
P).i  o  CO 


CO 


Q-p  -D 

,  ^  o  ^ 

CO  o 


CO 

CD 

a. 

>> 

.f-j 

o 

-»— » 

o 

V. 

d 

sz 

o 

5  O  0 
$t3  ra 

l-ii 

0.^2 

O 

"F  'P  O) 
o  0 

_i.  TD  > 

"D  XI  to 
C  X 

§ii 

^  o“? 

J;2”cvi 

(o 

OC 


o 

O. 


7:00pm  -  7:30pm 
Tues,  1 1  May  -  2.8 


Worldwide  Optoelectronic 

Lightwave  Technology  at  Hughes  Aircraft  Component  Production 


Distribution  ofOE  Companies  by  State 


Optoelectronic  Industries 


OIDAt:=T^ 


World  OE  Production  ($B)  1998 

(Total:  $140.6  B,  North  America  E$Smate:$4Z8  B) 


OIDA  Programs 


OIDA 


OIDAt 


OIDA  Workshops 

1  1998  1 

Metrology  for  OE 

Feb  98 

Optical  Communications  Roadmap 

May  96 

Technology  Roadmap  for  Image  Sensors 

Jun  98 

Annual  Foaim 

Oct  98 

1999 

Broadband  Communications  &  Switching 
Components  Technology 

Apr  99 

Advanced  Imaging  -  “Bectronlc  Eye’ 

Jun  99 

International  Standards 

Aug  99 

Annual  Foaim 

Oct  99 

OIDA 


Layers  of  Communicaaon 
Technology  Roadmap 


Soeloicsnamlc  Drtving  Porc«» 


Communlcalons  Syitam  Architectuiw 
StikM  of  Crttkal  OE  ConpoMntk 


OIDA 

Number  of  Students  Placed  in  Industry  from 
OEDARPA  Centers  1990-1996 


Market  Strengths  of  Japan  and  the  US 


NCIPT 

71 

OTC 

48 

CMC 

60 

COST 

47 

226 


oida: 


m  North  American  Strengths 
»-  CofTOTunicatorts 
»  Industrial  uses 
MOitary  appfications 


■  Japanese  Strengths 
>-  Consumer  applicallons 
»-  Displays 
»  Storage 


■  Emerging  Opportunities! 

-  Imaging 

»-  New  Information  Age  applications 
»•  Medical  technology 
Transportation _ 


OIDA 


Board  &  Back-plane  Level  Optical  Interconnections 
Using  Integrated  Thin-cladding  Polymer  Fibers 


Yao  Li 

NEC  Research  Institute, 

4  Independence  Way,  Princeton,  NJ  08540. 
e-mail:  yao@research.Dj.nec.com 


7:3bpm  ■  8:00pm 
Tues,  1 1  May  -  2.9 


Other  Contributors  &  Collaborators 


Jun  Ai 

NEa, 

USA 

Jan  Popelek 

NECI, 

USA 

K.  Kasabara 

NECCRL, 

Japan 

Y.  Takiguchi 

Hamamatsu,  KK 

Japan 

®  lEEE-Santa  Fe.  05/1 1/99 


Talk  Outline 


*  Introductions^ 

*  PDF’s  as  Short-distance  Optical  Channels, 
PDF’s  for  Intra-computer  Interconnections, 

Project  I:  mnIti-Gb/s  on-board  clock  dlstiibutioiis. 
Project  n:  2D  parallel  optical  drenits  on  PCB. 

*  Some  experiments, 

*  Summary  and  Conclusions. 


Introduction 


*  Bandwidth  bottleneck  at  PCB  level, 

(>500  MHz  iKi-«liip  Si  <200  MHz  ofT'Cliip) 

*  Problems  of  conventional  waveguides, 

(hieh  cost  for  glass  waveguides,  large  loss  for  polymer  ones) 

*  Space  parallelism  can  be  spplied  by  VCSEL’s, 

(ID  &  2D  arrays  with  low  fabrication  cost,  low  threshold  emrent) 

*  PDF  offers  low-cost,  high-rigidity,  low-loss, 

(1/4  of  glass  fiber  cost,  breakage-free,  <  3  dB/m> 

*  Low-cost  Polymer  fiber-image-guides  (PFIG’s) 
are  also  becoming  commecially  available 


Basics  of  Polymer  Optical  Fibers 


*  1st.  POF  in  1970's  but  progress  was  slow, 

*  Main  applications  now  in  display  &  lighting. 


*  Low  material  &  production  cost, 

(1/4  of  cost  of  silica  fibers) 

*  High  attenuation,  (PMMA:12  db/ioo  m  @  650  nm), 

*  Thin-cladding  (90%  core)  &  Mnltimodes, 

*  Low  operating  temperature  (-20  to  80  *C), 

*  High  flexibility  and  rigidity  against  breakage. 


Two  Interconnect  Projects  at  NECI 
for  Board-leTel  POF  Circuits 


*  Muld-Gb/s  Optical  Clock  Distribution  Circuit, 
(10  Gb/s,  128  port,  connectorized  integrated  optics) 


*  2D  Parallel  Optical  Circuits  for  VCSEL  Arrays, 

(both  point-to-point  and  multi  point  capability) 


Board-level  Optical  Clock  Distribution  Using  End-tapered  Fiber  Bundles 


Power-loss  distxllmtion  eye  diagrams 


Photo  of  the 
fabricated  board 


Same  board  when 
in  operation 


Power  &  Resolution  Performance  Measures 


optical  Loss  cff  16-channel  Btrttetfly 
and  Shuffle  Intereonnects 


13 

^  13 
a 

s  13 
8  1.0 
i  0.8 

s  0.6 
O 


[□  Birtteifly  IShuffiej 


e  t  2  3  4  S  $  T  S  9  10  11 13 13 14 15 

Channel  Number 


Imaging  Result 


Transfer  Function 


Summary  and  Conclusions 

POF  suits  better  for  short*distance  applications, 
POF  offers  better  packaging  capabilities. 

Multi  Gb/s  bandwidth  is  sustainable, 

Free«-space  optics  can  add  value  to  POF  circuits. 
Packaging  capability  determines  practicality. 


NOTES 


Wednesday, 
12  May  1999 


NOTES 


